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ABSTRACT 



This report addresses assessment of college student 
performance. Discussion focuses on why classroom assessment of students* 
achievement is important; how an instructor can ensure the quality of 
information from classroom assessments; methods of assessment particularly 
suited to various achievement targets; how the results of several assessments 
can be meaningfully combined into one composite grade; ways for faculty to 
improve assessment skills; and conclusions about assessment from a review of 
the literature. The report describes five different kinds of learning goals 
or "achievement targets,*' and appropriate forms of assessment for each. These 
learning goals are: (1) knowledge of facts and concepts (recall); (2) 

thinking, reasoning, and problem solving using one*s knowledge; (3) skill in 
procedures or processes; (4) constructing projects, reports, artwork, or 
other products; and (5) dispositions, such as appreciating the importance of 
a discipline. Following an introduction, individual chapters discuss defining 
student learning for assessment, ensuring the quality of classroom assessment 
infomation, options for classroom assessment, assessment in the disciplines, 
grading, grade distributions and grading policies, and conclusions and 
further resources. (Contains 76 references.) (DB) 
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EXECUTIVE SUMMARY 



How does an instaictor know whether students are learning 
what the instructor is trying to teach them? How do students 
find out how they are doing, and can they use that informa- 
tion to study more effectively? Would students be able to tell 
what the instmctor thinks is important for them to learn by 
looking at the assignments that “count" in a course? Good 
assessment yields good information about the results of in- 
struction; it is itself a necessary component of good instruc- 
tion. Students who do not understand what they are aiming 
to know and how' they will be expected to demonstrate their 
achievements will not be able to participate fully in manag- 
ing their own learning. Sound assessment and grading prac- 
tices help teachers improve their owm instaiction, improve 
students’ motivation, focus students' effort, and increase 
students' achievement. 

“Assessment” means to gather and interpret information 
about students' achievement, and “achievement" means the 
level of attainment of learning goals of college courses. 
Assessing students' achievement is generally accomplished 
through te.sts, classroom and take-home assignments, and 
assigned projects. Strictly speaking, “assessment" refers to 
assignments and tasks that provide information, and “evalua- 
tion" refer.s to judgments based on that information. 

Why Is Classroom Assessment of Students’ 

Achievement Important? 

Students should be able to tell w^hat the instructor thinks is 
important for them to learn by looking at a course's tests, 
projects, and other assignments. These assessments are an 
instructor's w'ay of gathering information about w^hat .students 
have learned, and they can then use them to make important 
decisions — ^ahouc students' grades, the content of future les- 
sons, the revision of the structure or content of a course or 
program. Thus, it is important that student assessments in 
higher education classes give dependable information. 

How Can an Instructor Ensure the Quality of 
Information From Classroom Assessments? 

Information from classroom assessments — grades, scores, and 
judgments about students’ w’ork resulting from tests, assign- 
ments, projects, and other work — must he meaningful and 
accurate (ihal is. \’alid and reliable). The resuks of as.sessment 
should be indicators of the particular learning goals for the 
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course, measuring those goals in proportion to their emphasis 
in the course. An instructor should be confident that saidents’ 
scores accurately represent their level of achievement. 

The Art and Science of Classroom Assessment describes 
five different kinds of learning goals or ‘'achievement tar- 
gets”: knowledge of facts and concepts (recall); thinking, 
reasoning, and problem solving using one’s knowledge; skill 
in procedures or processes, such as using a microscope; 
constructing projects, reports, artwork, or other products; 
and dispositions, such as appreciating the importance of a 
discipline. Different methods of assessme;*c are better suited 
for measuring different kinds of achievement. 

What Methods of Assessment Are Particularly Suited to 
Various Achievement Targets, and How Are They 
Constructed, Administered, and Scored? 

Four basic methods of assessment are presented: paper-and- 
pencil tests, performance assessments, oral questions, and 
portfolios. Paper-and-pencil tests are the most common)"’ 
used form of assessment in higher education. Performance 
assessments are tasks and associated scoring schemes ('‘ru- 
brics”) that require students to make or do something whose 
quality can be observed and judged. Oral questions are com- 
monly asked in the context of classroom discussions, more 
often in smaller seminar-style classes than in large lecture 
sections. Portfolios are collections of students' work over 
time, according to some purpose and guiding principles; they 
usually include students’ reflection on the w'ork. Ttye Aii and 
Science of Classroom Assessment provides suggest'ons about 
writing good tests, performance tasks, oml questions, and 
portfolio specifications, and about constaicting scoring 
schemes that examine perfcnnance according to learning 
goals. Two kinds of scoring — objective, requiring a right/ 
wrong or yes/no decision, and subjective, requiring judg- 
ments of quality along a continuum — and principles for de- 
vising scoring schemes and examples are described. 

How Can the Results of Several Assessments 
Be Meaningfully Combined Into 
One Composite Grade? 

Grading usually requires constiucting one score or judgment 
from several scores on various assignments and tests. The 
combination must be valid and appropriately weigut the 
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scores of various components according to their places in 
the instructor’s intentions for the course. A set of good as- 
sessments can be rendered into an invalid grade if the indi- 
vidual scores are not carefully combined. Four methods of 
determining final grades serve different grading purposes an 
instructor might intend, depending on the c. nurse: the me- 
dian method, weighted letter grades, total possible points, 
and holistic rating. 

The topic of grading is found in the higher education 
literature, largely under discussions or studies of “grade in- 
flation." A review of the recent literature on grade inflation 
may yield some surprises for readers. Although grade infla- 
tion is a concern at the present time, previously during this 
century writers expressed some concern about grade ^/<?fla- 
tion. Several authors have raised related issues that suggest 
the topic is more multifaceted than the straight-line function 
the term ‘‘inflation” implies: issues about the nature of edu- 
cation, differences in grades among the disciplines, and the 
noncomparability of grades in different historical periods. 

In What Areas Might Faculty Improve Their Assessment 
Skills, and What Resources Are Available to Help? 

Assessment of students’ work in higher education classrooms 
is important — and important to do well. One science profes- 
sor has l)een heard to comment that profes.sors sometimes 
measure the specimens in their labs more accurately than 
they measure the students in their classrooms, yet important 
human consequences follow from both. Faculty members 
who wish to improve their skills in assessment can find some 
good resources already available, some of the best of which 
are recent books and articles, and easily obtained materials 
on the Internet. The Ayi and Science of Classroom Assessment 
summarizes some of what the author thinks are the best 
“next step" resources for readers. 

What Conclusions Can Be Drawn From the 
Review of the Literature? 

The literature on principles of (flas.sroom assessment has been 
written nuxstly for K-12 education, Tfoe A)i and Science of 
Classroom Assessfvcmt uses examples and discusses asses.s- 
ment contexts relevant ic^ college courses and young Cand 
not-so-yoLing) adult students. Empirical studies of cias.sroom 
assessment in higher education underscore the importanc:e of 
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instmctors’ fairness, clarity in tests, assignments, and scoring, 
and clear descriptions of the achievement target o'* learning 
goal in higher education classrooms. More studies are needed 
that investigate the needs, types, results, and effectiveness of 
assessment in higher education and that tie the findings to 
theories about adult learners. Some excellent resources 
presently exist for helping instructors design and conduct 
valid, reliable, fair, and interesting assessments of students' 
work — a crucial function in higher education classrooms. 
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FOREWORD 



In her opening address to the Association for the Srudy^ of 
Higher Education in 1998, Yvonna Lincoln encouraged higher 
education professionals to embrace an ethic of lo\ e in their 
research, teaching, ser\-ice, and administrative responsibilities. 
This ethic can help us to conduct our activities in ways that 
are fair, equitable, and compassionate. One illustmtion of this 
ethic is to provide students with detailed feedback and as- 
sessment of their work, even though it can be time consum- 
ing and difficult, with few' rewards. The benefit for smdents 
and learning can be significant. 

One argument for the value of classroom assessment of 
students is that an instructor can know^ w^hether students are 
learning wiiat the instructor is trying to teach them, and stu- 
dents can find out how' they are doing and then use that in- 
formation to study more effectively. Good teachers care 
about these issues, and wState legislaaires, education commis- 
sions, coordinating boards, tmstees, and others are calling for 
assessment of students’ learning. Administrators are paying 
closer attention to faculty members’ assessment of students; 
program and college procedures are being established. But 
one of the most compelling arguments for the need to im- 
prove faculty members’ classroom assessment is that it helps 
to impro^'e smdents’ learning! More and more studies demon- 
strate that students w'ho participated in a class w here grading 
was based on performance increased their competency. 

vSusan M. Brookhart, associate professor of education at 
Duquesne University, has w'ritten a compelling description of 
effecth e assessment of smdents’ achie\*ement in college and 
university' classes. She describes the importance of learning 
goals or achievement targets as a necessaiy first step tow'ard 
classroom assessment and reminds us that assessment should 
be planned at the same time the syllabus is prepared. One of 
the mo.st important principles of V?e Aii cwd Science of Class- 
room Assessment \s that assessment is pan of effective in.staic- 
tion. It is a pan of planningunsLaictioa assessment. 

Brookhart clearly defines terms and shows how' to use them 
through examples, summarizes the literamre on classroom 
assessment in higher education, details methods of assess- 
ment. delineates the de\’dopment of good assessment instal- 
ments and scoring procedures, reviews grading .strategies, and 
provides several mcxlels for achie\ ing the goal of quality 
classrcx'im assessment and rescnirces for taculty to improx’e 
as.sessnieni skills. A quality product results if the feedl^ack is 



The An a^id Scion c of ('.ItissnKmi Assess^nent tx 

u 



valid and reliable. The Art and Science of Classroom Assess- 
ment provides the needed evidence, resources, and ad\ice to 
guide new and experienced faculty members, including many 
informative tables with concrete examples. 

The Art and Science of Classroom Assessment provides in- 
sights into three areas challenging the academy: (a) profes- 
sional standards of assessment; (b) outcomes assessment; and 
Cc) grade inflation. No professional standards exist for assess- 
ment of students in higher education, and it is hoped this 
book will help to establish a set of standards for faculty mem- 
bers to follow and to model for graduate students, our future 
teachers. Perhaps it can begin a dialogue about developing a 
set of standards for assessing students’ work, Brookhart notes 
that classroom assessment needs to be linked to institutional 
and state assessment of student outcomes, especially given 
that accreditors and state legislatures are requiring proof of 
saidents’ learning before allocating funds. With regard to the 
issue of grade inflation and grading policies, the rise in the 
average grades of students in college is an issue of account- 
ability that should be addressed. In an increasingly litigious 
society, grading policies are becoming increasingly important. 

Several other ASHE-ERIC Higher Education Reports pro- 
vide additional perspectives on assessment. Karen and Karl 
Schilling’s Proclaimnig and Sustaining Excellence iyo\, 26, 
no. 3) provides an overview of the assessme nt movement 
and its impact on the academy, and examines important 
issues for chairs and deans, Elizabeth Creamer's Assessing 
Faculty Publication Productility (yo\. 26, no. 2) addresses 
the assessment of faculty. Mimi Wolverton's A Netv Alliance: 
Continuous Quality and Classroom Effectiveness no. 

6) examines seven institutions’ use of the Malcolm Baldrige 
standards for increasing effectiveness in the classroom. And 
Lion Gardiner’s Redesigning Higher Education: Producing 
Dramatic Gains in Student Learning {vo\. 23, no. 7) details 
how assessment can improve students’ learning. 

It is hoped that TlyeAti and Science of Classroom Ass< s- 
ment will provide the inspiration and guidance faculty need 
to assess students’ work accurately and fairly. 

Adrianna J. Kezar 

Series Editor, 

Assistant Profe.ssor of Higher Education, and 
Director, ERIC Clearinghouse on Higher Education 
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INTRODUCTION 



The Purpose of Assessing Students 

How does an instaictor know whetlier students are learning 
what he or she is trying to teach them? How do students find 
out how they are doing, and can they use that information to 
study more effectively? Can students tell what the instructor 
thinks is important for them to learn by looking at the assign- 
ments that “count’' in a course? Good teachers care about 
such questions. Sound assessment and grading practices help 
teachers improve students’ motivation, effort, and achieve- 
ment. Sound assessment makes it easier to design and deliver 
good instruction and to describe its results. 

“Assessment” is defined here as gathering and interpreting 
information about student achievement; “student achieve- 
ment” is defined as level of attainment of learning goals in 
college courses, and its assessment is generally accomplished 
through tests, classroom and take-home assignments, and 
projects that students undertake to provide information about 
w'hat they are learning. Strictly speaking, “assessment” refers 
to assignments and tasks that provide information, and “eval- 
uation" refers to judgments based on that information. 

The need for quality information 
in student assessment 

Assessing students’ achievement in higher education class- 
rooms provides vital information for several different pur- 
poses. It provides feedback to students that fosters learning 
and provides information to the professor about students’ 
achievement of learning goals for the course. Assessment 
provides the basis for students’ grades for a course, w^hich in 
turn seriously affect students’ progress through higher edu- 
cation, future course selections, and vocational and avoca- 
tional choices. It provides the basis for instnictors' evalua- 
tion and adjustment of their own teaching. Assessment by 
students of a course can be part of the evaluative informa- 
tion considered by the program of which the course is a 
Lv;i ponent. Because important decisions are based on infor- 
niat>cn derived from classroom assessments, it is imperative 
tiie information he of high quality: accurate, depend- 
aiMc, meaningful, and appropriate. 

This monograph is about assessing students' achievement 
ir. college and university classes. It is not about university 
outcomes assessment or program evaluation, although both 
of these purposes depend on regular, well-done classr(')om 
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assessment. It is written primarily for college and university 
faculty, but a secondary audience would include college and 
university^ administrators, who may find that knowledge of 
this topic will help them deal with questions or problems 
their faculty or students may have. This monograph will 
answer several questions: 

1. Why is classroom assessment of students’ achievement 
of course learning goals important? 

2. How can one ensure the quality of information derived 
from classroom assessment? 

3. What assessment methods are particularly suited to vari- 
ous targets, both in general and in academic disciplines? 
How are they constructed, administered, and scored? 

4. How can the results of several different assessments be 
meaningfully combined into one composite cour.se 
grade for a student? 

5- In what areas might faculty members improve their as> 
sessment skills, and what resources are available to 
them for doing so? 

Assessment as part of a model of instruction 

Assessment is generally considered one of three aspects of 
instruction (along with planning and teaching). Many differ- 
ent theorists and practitioners of instmction have described 
models of instruction (see, e.g*, Kubis 2 yn & Borich, 1993). 
Most have a similar, three-part construction. Sometimes the 
model is linear: Planning informs instruction informs assess- 
ment. But that description is overly simplistic, and a more 
accurate way of thinking about teaching would involve a set 
of bidirectional relationships. Planning and decisions about 
what students should learn inform choices about what kinds 
of instructional activities will help .students learn the particular 
knowledge, skills, thinking and reasoning strategies, and so 
on. But the availability of certain instructional activities, for 
example, the existence of a good film or a well-tested exer- 
cise on a certain topic, sometimes informs planning. Planning 
decision.^ about what srtidents are to learn .should inform as- 
sessment. but sometimes the results of assessments describe 
.students* knowledge in progres*' and result in changing plans, 
for example, to reteach material that was not learned. Assess- 
ment can and should he integrated with instmction and 
should inform both instaiction and ongoing course planning. 



This report illustrates how this relationship can happen when 
assessments are valid and reliable, that is, when they give 
appropriate, meaningful, and accurate information. 

The classroom assessment environment 

The v/ay an instructor approaches assessment affects the way 
students perceive a class, the material for study, and their 
own work (Brookhart, 1997; Rodabaugh & Kravitz, 1994; 

Stiggins & Conklin, 1992). Eight aspects of the classroom 
assessment environment have been identified, based on re- 
search in public school classrooms: purposes of assessment, 
methods of assessment, selection criteria, quality of assess- 
ment, feedback, the teacher’s characteristics, students’ per- 
ceptions, and the policies for assessment under which teach- 
ers w^ork (Stiggins & Conklin, 1992). 

The idea of a classroom assessment environment is impor- 
tant because it focuses instructors on the impart of their ap- 
proaches to assessment on students’ motivation and achieve- 
ment. Table 1 presents interview questions from a study of 
the assessment environment in college classrooms (Brook- 
hart, 1997), which is useful for readers’ reflection on their 
own approaches to assessment as they begin to read this 
mcmograph. 

Organization of This Monograph 
Principles of educational assessment 

This monograph summarizes w’^ork in educational measure- 
ment and instruction, which together suggest principles for 
classroom assessment that enhances instruction and produces 
meaningful information about students’ achievement. The 
principles draw^ on theoretical and empirical w^ork broadly 
applicable to all levels of learning. The aim of the next six 
sections is to present a readable summary- of these principles, 
enough for a good orientation lo classroom assessment in 
higher education, but not enough to substitute for a thorough 
study of educational measurement. (Readers who wish to 
pursue the topic of student assessment in higher education 
will find resources listed in the last section,) 

Student assessment should be multidimensional, and it 
should be the focus of ongoing communication with stu- 
dents about their achievement of the course’s objectives. 

The methodology for classroom a.ssessment can be thought 
of as a toolkit that faculty use for accomplishing their pur- 

The Art and Science of Clas:in)om Assessment ^ J 

15 



TABLE 1 

Questions for Instructors’ Reflection on Assessment 



1. For what purposes do you use assessment? (grades, grouping, 
diagnosis, motivation, evaluation of instaiction, communicat- 
ing expectations, planning instruction) 

2. What types of assessment methods do you use? (tests, quizzes, 
performance as,sessments, oral questioning, assignments, stu- 
dent peer ratings, self-ratings) 

• In what propoition do you use these assessments? 

• What kinds of performance do you assess? (recall, analysis, 
comparison, inference, evaluation) 

• How do you (.leal with cheating? 

• Do you assess .students' di.spo.siiions as well as achievement? 
(niQtivation. interest, maturity, study skills) Formally or in- 
formally? 

3. What criteria do you use to select an assessment? (results fit 
purpose, method matches insiaiction, ea.sc of de\ ek»pmem. 
ease of use, ea.se of scoring, origin of assessment, time re- 
quired, degree of objectivity, thinking .skills tapped, effective 
control of cheating) 

A. How would you judge the quality of your a.sses.sments for gi\'- 
ing you the information you need about students’ achievement? 

5. How do you gi\ e feedback to students about their perfor- 
mance? (wTitten, oral, gnules, informal) 

6. How do you \iew the role of teacher? (knowledge presenter. 
facilitak)r of student-con.siructed learning) How do you \ iew 
the role of student? Cco<.)perative, competitive) How mucli of 
students’ failure or .succe.ss do you attribute to the .student? To 
the teacher? 

How do you view the characteristics of the students in your 
das.s(es)? (abil../, work habits, maturity, social skills, willing- 
ness to perfc)rin. need for feedback, self-assessment skills, 
sense of fairness, reaction to te sting a.ssessme'nt. expectations 
of the tc-acher) 

8. Ate there asse.ssmeni policic.s vem mu.si IoIUa\? l)L,.eribe iI-umti. 



poses. 'Ibis rcpoil presents such a toolkit, organized acet^rd- 
ing t(^ a modifted version of a framevvork for understan(.ling 
tvjxvs of classroom a.ssessnu'nl (see Stiggins, 1992, 199*^). 
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Assessment methods can be grouped into four general cate- 
gories: paper-and- pencil tests, performance assessments of 
processes or products, oral communication, and portfolios. 
For each category, objective (right/wrong or present/absent) 
and subjective (judgment of degree of quality) scoring can 
be developed. Different assessments are necessary to cover 
the full range of achievement targets: knowledge, thinking, 
processes, products, and dispositions. 

Review of higher education literature 

Although student assessment in higher education should be 
informed by general principles of assessment and instruction, 
some features of higher education make it a special context 
for classroom assessment: widely vaiying class sizes, the 
noncompLilsory nature of enrollment, the possibility of a 
grade of Incomplete, and the fact that students are adults. 

This report also presents a review of literature specific to 
smdent assessment in higher education, including some gen- 
eral essays and studies (“Defining Student Learning for As- 
sessment"), some discipline-specific literature (“Assessment in 
the Disciplines"), and sSome literature on grading in higher 
education (“Grade Distributions and Grading Policies”)- 
The author, with the help of the staff of the ERIC Clearing- 
house on Higher Education, reviewed the literature on assess- 
ment in higher education classes by first searching the ERIC 
database for materials from 1985 to the present that included 
the phrase “classroom assessment" in the title, abstract, or 
descriptor and “higher education," “colleges," or “universities” 
in the descriptor. The search identified resources about stu- 
dent assessment in higher education classes. Two categories 
of literature resulting from the vSearch were set aside because 
the focus of this report is assessment of students' achievement 
of learning goals for the course: (a) resources that were basi- 
cally about program evaluation or outcomes assessment, 
where the interest was in aggregated group achievement for 
iastitutional purposes; and (b) resources about classroom 
assessment techniques (see Angelo & Cross, 1993: K. Cross Sc 
Angelo, 1988), where assessment is anonymous and the unit 
of analysis is the class, not individual students. The assessment 
of students' achievement can be fomiative and not included in 
the course grade, or official and summative, counting in a 
course grade, but the individual student's name is knenvn and 
individual progress or achievement is the concern. 
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vSome sources identified by hand ^'ere added to the litera- 
ture on student assessment from the ERIC review. Recent 
issues oi Journal on Excellence in College Teaching and 
College Teaching^nA recent issues of discipline-specific jour- 
nals from professional organizations of teachers, such as 
Journal of College Science Teaching and College E^iglish, were 
examined, A search of Dissertation Abstracts International 
yielded one useful recent reference. The reference lists in 
some of the articles identified additional useful sources. 

Professional Standards for Student Assessment 

Professional standards for teachers’ competency in classroom 
assessment exist for K-12 teachers (American Federation, 
1990; Joint Advisory Committee, 1993). Their content pro- 
vides a useful springboard for discussing both professional 
competence and fairness in assessment in higher education. 
Standards for Teacher Competence in Educational Assessment 
of Students Federation, 1990) states that teachers 

should be skilled in choosing and developing assessment 
methods appropriate for instructional decisions; administer- 
ing, scoring, and interpreting the results of assessments; us- 
ing assessment results when making decisions about individ- 
ual students, planning teaching, de\^eIoping the curriculum, 
and improving programs; developing valid grading proce- 
dures based on assessments of students; communicating the 
results of assessment: and recognizing unethical, illegal, and 
inappropriate methods and uses of information about assess- 
ment. Some discipline-specific professional standards also 
exist for K-12 teachers (e.g.. National Council, 1989) that are 
applicable for the college level. 

The Joint Committee on Standards, sponsored by 16 pro- 
fessional educational organizations spanning elementary 
through postsecondar}' levels, is preparing Standards for 
Evaluation of Students, to be ready about 2002. These stan- 
dards are grouped into four categories: proprietary, utility, 
accuracy, and fairness. The proprietary' standards state that 
evaluations of students should be conducted according to 
sound educational principles and meet the educational and 
informational needs of students as well as their instructors 
and institutions yoint Committee, 1998). Further, they call 
for formal, written policies and procedures for evaluating 
students. The other three categories of standards — ^utility (or 
practicality and usability), accuracy, and fairness — also con- 




tribute to sound assessment in postsecondary education. 
Readers who ground their assessments in the principles 
described in this report will be aligning their practice of 
assessment with professional standards. 
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DEFINING STUDENT LEARNING 

FOR ASSESSMENT 

The goals for learning in an academic course should be spec- 
ified from the outset, and then those goals should be the 
focus of student assessment. Whether the instructor calls them 
“course objectives,” “goals,” or something else, every instruc- 
tor should be able to articulate what he or she intends stu- 
dents to learn (Walvoord & Anderson, 1998). Sometimes, for 
example, for an introductory' or surv'ey course, those goals 
will be very structured and centered on comprehension and 
application of basic concepts. Sometimes, as for an advanced 
seminar, those goals will be broad. For example, the goals for 
a senior seminar in educational psychology might be for the 
student to read and understand literature in an area he or she 
chooses within the broad domains of either cognition or moti- 
vation. But the instructor still must specify what the student is 
meant to accomplish and then assess to what extent that hap- 
pens. In this example, the instructor would need to find ways 
to decide whether and how' well a student had selected, read, 
and understood an appropriate body of literature. 

College instructors have been known to comment on what 
they might term the restrictive nature of such “outcomes- 
driven” instruction that does not leave room for “creativity.” 
This approach represents a narrower interpretation of the 
terms “objectives” and “goals” than is helpful. The existence 
of intentions for instruction does not mean that students’ orig- 
inal thought is precluded; original application of ideas may 
w'ell be considered one of the instructional aims for a course. 

Some instructors are used to focusing on content when 
planning a course. They are more likely to ask themselves 
“What material should I present?” than “What do 1 want my 
students to learn?” This approach makes assessment difficult. 
Worse, it changes the focus of leaching and learning from 
one in which intentional learning is expected of students 
and facilitation of that learning (including, but not limited to, 
presenting material) is expected of the instructor, to one in 
which “receiving” material is expected of the students. After 
they have received the material presented, the students must 
figure out w^hat to do with it. Students are out of luck if at 
exam time their answers to a question do not match the 
instructors take on it. 

Many professors who think in terms of presenting mate- 
rial really do have a set of concepts they want their students 
to understand in certain w'ays, and they can express their 
intentions for their course in the iom of goals or objectives 
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if they stop and think about it. The message of this report 
on assessment is that this is a good thing to do! Put those 
goals in the syllabus, share them with students, and then 
make sure that their learning experiences and assessments 
match them. Doing so does not make the material any easier 
(although it should make the course a little “easier" for the 
students to deal with); it doesn't “give away the farm.” But it 
does keep everyone, both instructor and students, focused 
on what is important to learn. It gives purpose to class activ- 
ities, assignments, and assessments. When students under- 
stand the purpose behind an assessment, even a difficult 
one, they are less likely to complain, more likely to tackle it 
sincerely, and more likely to learn from it (Covington, 1992). 

In this report, the terms “instructional objectives," “learn- 
ing goals," and “achievement targets" are used interchange- 
ably to mean the instructor s intentions for students’ learning 
of course material. Some writers use the term “goals" for 
relatively broad learning intentions and “objectives” for more 
specific ones. Because this report is aimed toward a broad 
range of postsecondary instructors and their courses, general 
use of these terms is appropriate. What is a broad goal in 
one course may be a more specific objective in another 
context. Learning objectives or goals have been character- 
ized as “achievement targets” (Stiggins, 1992, 1997). This 
metaphor is an apt one, because it captures how it feels to 
be a student “shooting for" something, w^ith all its connota- 
tions of effort, care, and aim, and it also captures something 
of the “prize" of learning, of hitting the bulls-eye. 




Kinds of Achievement Targets 

Achievement targets come in several varieties, described as 
knowledge, thinking, products, skills, and dispositions (Stig- 
gins, 1997). Most college courses have achievement targets 
in several of these categories. Some assessments are more 
effective than others for gathering information about stu- 
dents’ learning in a particular categoiy'. 

Knowledge of facts and concepts is basic to most college 
learning. For example, it may be important for students in 
an algebra course to know that the point of intersection of 
lines charting the profitability functions of tw'o choices in a 
business venture is the point at which both choices would 
yield the same profit. In a European history course, it may 
be impoUant for students to know that Hitler was the dicta- 
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tor of Germany from 1934 to 1945 (fact) or that his ascen- 
dancy may be explained by the theory that a ‘'power vac- 
uum” existed in Germany at a time of deep economic dis- 
tress (concept). 

In this information age, an interesting phenomenon is 
occurring with regard to knowledge of facts and concepts. 
There are way too many facts to remember, and some of a 
good education consists of knowing which facts and con- 
cepts are important to commit to memory, available for in- 
stant retrieval any time, and which facts and concepts are 
best “memorized" by knowing where to look for them in 
books, computer files, or other resources. Learning what is 
acceptable to “forget” and learning how to let go of uhat 
material is a skill students’ grandparents did not need as 
much as today's students do. 

Assessment of students’ knowledge of facts and concepts 
is probably the most straightforward, and certainly the most 
common, form of assessment in college courses. The current 
tmmpeting about “higher order thinking,” important though 
it is, should not be read to mean that knowledge of facts 
and concepts is not important. It means a balance must be 
achieved in teaching knowledge, skills needed to obtain 
more knowledge, and strategies or methods for constmcting 
new knowledge. 

Thinking, understanding, and applying concepts, some- 
times called “higher order thinking,” also form part of the 
learning goals for most college courses. It is not enough to 
know and be able to recognize or even recall a fact unless 
one knov's what to do with that fact once it is retrieved from 
memoiy^ or from resource material. Thoughtful application of 
concepts learned has long been a goal of higher education 
and, indeed, is a traditional hallmark of an educated person. 

Students who understand concepts can use them to reason, 
argue, persuade, explain, illustrate, and discuss a variety of 
statements, scenarios, points of view\ and the like. Students 
who understand concepts can use them to solve novel prob- 
lems. For assessment, it is important to emphasize that a stu- 
dent is not demonstrating understanding if the problem is not 
new to the saident. If a chemical reaction problem on a 
chemistiy^ exam is the same as one that was w orked in cla.ss. 
the task is one of recall, no matter how difficult a recall task it 
may be. If the chemistiy' professor wants to as.sess whether 
siLideiUs can use princij'>lcs they leari'ied about how chemical 
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reactions occur to solve a chemical reaction problem, the 
problem has to be one the students have not seen before and 
one that can be solved by applying the particular principles 
the instructor intends to assess. (See “Options for Classroom 
Assessment” for a more detailed discussion of this principle.) 

Sometimes learning goals for students result in the produc- 
tion of academic products. A student in a creative writing class 
who produces a short story, a student in a biology lab who 
produces a project and a lab report, and a student in a history' 
class who produces a videotape all share with the art student 
in the saidio the characteristic that the major demonstration of 
their achievement is a produa. Products vary in quality, and 
this quality can be judged — by the instructor, other students, 
and professionals — ^according to agreed-upon criteria. 

Some learning goals for students include assessment of 
procedural skills. Some areas more than others require the 
intentional teaching, learning, and assessing of skills, but all 
academic areas require the acquisition of skills. Biology stu- 
dents must learn how to use laboratory equipment safely, 
effectively, and with skill. Computer students must learn how 
to use keyboards, mice, and softv^are. Math students must 
learn how to use graphing calculators. Social science stu- 
dents must learn how to use various kinds of maps. Nursing 
students must learn how to use hypodermic needles to give 
injections and sphygmomanometers to mea.sure blood pres- 
sure. All students must learn how to use the library to find 
information, On and on the list goes. 

If procedural skills are an important part of learning goals 
for a course, the level of skill that a student has acquired 
must be assessed. Often, it can be done by performance 
assessment, in w’hich the instructor or other students ob- 
ser\'e the student’s performance of a particular skill or set of 
skills on a task and rate, judge, or even describe in words 
the quality of the performance they have observ'ecl 

Knowledge, thinking, products, and skills are the academic 
learning goals for mo.st courses. Dispositions interests are 
intended, if unstated, learning goals for many courses as well. 
For example, most insiinctors hope tiiat their students de- 
wlop an appreciation for the importance (;f their field and its 
contribution to humanity. Most instructors also harbor the 
hope that .some of their students will develop a personal in- 
terest in their Field, pursuing it further in their educ'ation and 
perhaps as iheir vocational c hoie'e. 



Most times, it is best that these dispositional learning goals, 
stated or not, not be assessed for inclusion in a course grade. 

After all, if a student knows that the “right’' answer to the 
question “Are you interested in math?’' is “Yes, very'” and that 
that answer will earn an A, then all of a sudden many stu- 
dents will register interest! But it is reasonable, witfiout the 
consequences of a grade attached, to assess students’ disposi- 
tions toward things that matter to the conduct of the class. 

The options for assessment described later include methods 
of assessing dispositions and interests as well as more aca- 
demic targets. 

ERIC Resources 

Tables 2 and 3 summarize ERIC resources about the assess- 
ment of students in higher education (but not about grading, 
which is covered in “Grade Distributions and Grading Pol- 
icies”). Table 2 lists essays and descriptions about classroom 
assessment in higher education; Table 3 lists empirical stud- 
ies about classroom assessment in higher education. 

Many of the essays and descriptions in Table 2 are general 
guidelines for planning assessment (S. Brown, Rust, & Gibbs, 

1994; Community College of Vermont. 1992; Lantos, 1992; 

McTighe & Ferrara, 1994; Wergin, 1988). A theme they .share is 
that assessment should be planned at the same time the syl- 
labus is prepared. Tlie first step in planning a course is to 
identify its purpose and the learning goals and intentions for 
students. The syllabus, assigned readings, plans for individual 
classes and lessons, tests, papers, and projects and other as- 
signments should all relate to the learning intended for the 
course. This step is obvious when declarative knowledge is 
involved, when mastery of a body of concepts, facts, and gen- 
eralizations is the goal for tudents. But the same principle 
applies when procedural knowledge (e.g., how to conduct 
historical research or perform laboratory experimencs) or criti- 
cal thinking and reasoning form part of the learning goals. Tlie 
kinds of goals selected for a course have implications for the 
instruction and assessment that should go on, but tlie principle 
Uiat goals, instnjction, and assessment are related still hc:>lds. 

Some of the articles describe specific methods of assess- 
ment, their purposes and uses. Portfolios (Crouch & Fon- 
taine, 1994; Glasgov.', 1993) are a method that has long been 
iLScd in the fine arts; other disciplines, most notably writing, 
are experimenting with this assessment format today, lescs 
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TABLE 2 



Essays About and Descriptions of Classroom Assessment 
la College and University Classrooms 



Source 


Topic 


Main points 


S. Brown, Rusi, & 


For faculty development 


• Clear treatment witli many examples 


Gibbs, 199‘i 


in iussessment 


of methods of assessment in higher 
education 


Buchanan 


Suggestions for dealing 


• Cafeteria approach, in which stu- 


Rf)gers, ]990 


with classes of more than 


dents can decide to take essa>' or 




80 wiili regard to essay 


objective final, depending on a\'er- 




testing, makeup exams. 


age beforehand. Only 6-7 students in 




and writing new exam 


300 end up taking an essay final. 




questions 


• No makeups; student.s may drop 
lowest exam grade or drrip the zero 
for a missed exam. 

• Student-generated test items lend to 
l:>e conceptual and at least as good as 
textbook test-bank items but also 
cover class material. 


Coininuniiy Collcj^c 


Course planning 


• Pan 3 describes hc^w to plan a syl- 


of Vermont, 1992 




labus. iastmetion. and evaluation. 

• Recommends clear course objectives 

• Recommends criterion-referenced 
grading 


(TvHidi Fontaine. 


Portfblkis 


• Portfolio assc.ssmcnl changes tlie way 


199-{ 




students tliink about writing and write. 

• Dcscribe.s [portfolio use in a develop- 
mental writing program 

• Portfolio.s stress "reworking, rethink- 
ing, and revising.*' 

• A.ssignments put in portfolio are not 
judged suminaiively until the end of 
the semester, so there is room for 
trial and improvement. 


Cila.sgow. 1993 


I\)rt folios in a 


• Students became more refiective and 




devflopmental writing 


confident al-ioui theii reading and 




C( nirse 


writing. 

• Gives objectives (ui the syllabus and 
describes .students’ work on each one 

• literacy autobiographie.s describe a 
Student's lii.stoiy a.s a writer, shared 
in reader rcspon.se groups; oral and 
written respon.ses. rewriting; focused 
correction from instructor 

• Grade.s based on papers plus "risk 
taking, changing, practicing writmg. 
and peer etliting ’ 


1 lackctt l.e\ i(U', 


.As>t‘ssment niethcuK 


• Symposium, f>ral exam, writing, 


1003 




jinirnals. portfolios, and ('ther spe- 
L ifu' suggestions 



li 

25 






Source 

Harris. 



Hintos. 1992 



McClymcT Knoles, 
1992 



Topic 

Muhicultuml concerns 
in classroom assessment 



Expectations for written work 
should be “c’ear, demanding, 
positive, and enthusiastically 
held" to motivate students’ 
writing 



Much of what students are 
asked to do on tests calls upon 
learning facts and strategies 
but not on understanding 



Main points 

• Cultural concerns that get in the way 
of conventional college assessment 
include taboos against eye contact 
(as lacking respect for authority), 
competing, or standing out; values 
and experience with hands-on as 
opposed to abstract learning; oral as 
opposed to silent learning 

• Suggests using assessment techniques 
that build on those issues: peer 
feedback, learning logs (academic 
journals), learning file, minicapstone 
experiences (students reflect on any 
final assigriment, what they learned, 
and what it means lo them) 

• Put general requirements in syllabus, 
specific requirements in an a.ssign- 
ment sheet for each assignment; 
make a criterion list and spell out 
attributes of each criterion; use the 
criteria when grading the work; 
return work promptly 

• Uses Perkins’s '‘thinking frames" 
(information, problem .solving, epis- 
lemolog)', and inquiry) to describe 
academic thinking 
Questions that allow siudent.s to 
approach a problem-solving task as 
an informational one encourage 
ersatz learning. 

Acritical student coping mechanisms 
include clumps (amassing elements of 
critical analysis minus their k'gic — 
data packing, jargon packing, as.ser- 
tion packing) and shapes (using the 
logical forms of critical analysis with- 
oul substance — lx)rrowed analysis, 
surface analysis, or insi.stence upon a 
single tliread of meaning), which c'an 
allc:>w .students to pa.ss courses without 
nuisiering their meaning. Sugge.sted 
solution is authentic testing. 

'We ask too much of students by 
giving them unauthentic questions to 
answer, implying there i.s a closure 
an<I right answer, and then a,sk too 
little by comments that tell them they 
arc successful or partially .succ essful 
if they use clump.s or sliapes. 
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TABLE 2 (continued) 



Source Topic Main points 

McTighe <!<: Ferrara, Classroom assessmeni • A primes’ of dassroom assessment 

1994 principles and methods for all levels 

from preschool to graduate school 

• Primary purpose of classroom assess- 
ment is to improve students’ learning 
and inform learning. 

• Use multiple sources of information 
to assess learning. 

• Assure validity, reliability, and fair- 
ness in assessment. 

• Base assessment on intended learn- 
ing outcomes and purpose, and 
audience for inforination. 

Murray. 1990 Tests should be learning • Various methods of using tests to 

opportunities teach are presented (some more 

valid than others): second-chance 
exams and grade algorithms, “brain- 
husier” exams to be done by groups, 
group multiple-choice lesLs, peer- 
mediated testing, paired testing, 
answer justification, take-home 
exams, immediate feedback, 
cost/beneftt testing, alternate-form.s 
retesting, and reaction or opinion 
papers. 

O’Keefe. 1996 Grades should be based • Turnaround time is important, but so 

on substantive comments are critical comment, 

on work ♦ Experience with standard as.sign- 

ments (the example is a marketing 
case report) leads in.stmctors to be 
able to anticipate most comments. 

• Slieeis w'ith numbered comment 
codes save time and let students see 
their comments. 

VC’crgin. 1988 Classroom asse.ssmeni ♦ A short primer on cla.ssroom 

assessment 

• Relevance versus control in 
assessment 

• Item writing fc^r classrotmi tests 

• Validity and reliability 

(Buchanan ^ Rogers, 1990; McClymer Knc»lcs, 1992; Mur> 
ray, ^^90) are an efficient way to gather information about 
students’ knowledge of facts, concepts, and generalizations, 
and al least a limited amount of information about how .stu- 
denis can use iho.se concepts to rea.son or sedve pr(3l:)lems. 



I(> 



27 






Tests are almost a necessity^ in very large classes, which can 
range from 80 or so students to several hundred (Buchanan 
& Rogers, 1990). Large introductory survey classes commonly 
include knowledge of a body of facts and concepts as impor- 
tant learning goals, and tests are an efficient use of assess- 
ment time for these classes. It is still important to make these 
classes as interactive as possible and to develop students' 
thinking skills (Walvoord & Anderson, 1998). 

A variety of activities can be used to assess students’ 
achievement of learning goals: symposia, oral exams, jour- 
nals, papers, and other activities (Hackett 8c Levine, 1993)- 
The key is not to use novelty for its own sake in assessing 
students but to ask of any task or activity that students might 
undertake what knowledge and skills the assignment would 
tap; what information about students' achievement of the 
course's learning goals would be available from their perfor- 
mance on the assignment; and what knowledge, skills, and 
understanding could not be assessed effectively using the 
method. Activities can often become learning experiences in 
their ov/n right, serving double duty: Students learn from 
doing them, and the instructor learns about what students 
know and can do by reviewing them, AssCvSsment and in- 
staiction are both served. 

Talkie 3 presents empirical studies of assessment of stu- 
dents in higher education. In general, these studies confirm 
and emphasize that the principles of good instruction and 
assessment are effective and have desired results when im- 
plemented in college classrooms. Students do not like norm- 
referenced grading in which their performances are com- 
pared with their peers. Rather, they prefer that their work be 
compared witli a criterion or standard of quality, and they 
do better work under those circumstances (Jacobsen, 1993; 
O'Sullivan Johnson, 1993). Smdents appreciate opportuni- 
ties to work together on et^aiuations (Stearns, 1996), not 
least because their sscores are higher when they do. Unfor- 
tunately, at least one survey (Guthrie, 1992) found that at 
three institutions — a research university, a comprehensive 
college, and a liberal arts college — many of the assessments 
used in courses rapped low^er order cognition, such as recall 
of facts and concepts, not the higher order thinking for 
which the institutions were noted. 

An interesting series of simulation studies based on .scenar- 
i(')s on questionnaires (Rodabaugh 8c Kravitz, 1994) invesli- 
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TABLE 3 








Empirical Studies of Classroom Assessment 
In College and University Classrooms 




Study 


Context 


Sample Method 


Findings 


IV Brown, 


MBA program, 


9 students Perforntance 


• Students did no 


199-t 


repeating exam 


for same and . 


better on item.s 




questions for a 


different items 


repealed frtim a 




second section 


on le.st 


previous sec- 




of a course 




lion's exam 




versus using 




than on new 




new' items 




items. 


Guthrie. 1992 


iixainiriing the 


239 faculty — Sim cy 


• Evaluations 




goals, modes. 


92 from 


emphasized the 




and evaluation 


Stanford 


cognitive do- 




of in.struction 


(research 


main, put low 




for faculty 


university). UX) 


weight on class 




who.se students 


fr(]m Ithaca 


panicipation. 




demonstrated 


Ccomprehcn- 


• Evaluations 




gains in anahiical 


sive college). 


emphasized 




reasoning 


'11 from Mills 


lower order 






(women’s 


cognitive .skills. 






liberal arts 


not the higher 






college ) 


order skills for 
which their 








instruction was 








noted. 


Hale. Shaw, 


Science ]'>mhlem 


ISO siudenis. Computer' 


• Test valid and 


Burns. CV Okey. 


solving 


9th grade simulated multiple- 


reliable for 


19SJ 




through col- choice test 


asse.ssing scien- 






lege .science 


lihc problem 


j 




and science 
education 
cc curses 


.solving 



gated fcuir hypotheses by imnipulating conditions in the sce- 
narios. For example, they described a professor who did not 
return or discuss tests, did noi discard ambiguous questions, 
and did not give partial credit for partially correct answers 
(“unfair condition") to randomly selected subjects, and de- 
scribed a professor who did the.se things (“fair condition") to 
other subjects. The four hypotheses were that (a) college 
students' judgments of instructors would be affected the 
grades instructors assigned, (b) college students’ judgments of 
instructc^rs would be affected by the fairness of j^roccdures 
instmclors used, (c) the effect of procedural fairness would be 
stronger than the effect of grades, and (d) selection < ' instme- 
tors ( 'would ycHi take a cour.se from this instructor?") would 
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Study 

lacobsen, 1993 



Larson, 199') 



O'Sullivan & 
Jolinson, 1993 



Context 


Sample 


Method 


Liberal arts 


15 sections, 


Studied 15 


college instructors 


13 instructors. 


sections (90th 


noted by students 


27 randomly 


percentile or 


to be exceptionally 


selected 


above in 


good at test 


students 


"preparing 


preparation 




examinations” 

course 

evaluation) — 
surv'ey. 

intenaew, and 
administrative 
data 



Portfolio use 
in baccalaureate 
colleges 11 



395 institutions. Survey 
academic vice 
presidents 
or deans 



Findings 

* On average, 
small classes, 
upper-division 
courses, higher 
grades, perfor- 
mance classes, 
full professor 
and instructor 
ranks overrep- 
resented but 
variability was 
apparent 

• Students liked 
methodical 
approach to 
evaluation, skill 
development in 
one defined 
area, some 
student choice, 
opportunity to 
explain 
answers; 
disliked com- 
parison with 
other students 

• 202 institutions 
reported using 
portfolios; 47% 
of them used 
ponfolios for 
classroom 
as,sessment 

♦ Contents in- 



I'sing 


29 ,students Questionnaire 


eluded papers, 
projects, jour- 
nals, self- 
evaluations, 
faculty evalua- 
tions, videos, 
and drafts 
♦ Students wh<^ 


performance 


and 29 students 


participated in a 


assessments in 


in a c<imparison 


class where 


a graduate-level 


group 


grading was 


educational 




performance 


measurement 




based increased 


txuirsc 




their 






competency. 
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be more strongly affected by perceptions of fairness than by 
perceptions of personal warmth, lecturing ability, or course 
difficulty (p. 71). For the first three hypotheses, the judgments 
investigated were (a) perceptions of the instaietors caring for 
students, (b) perceptions of students’ respect for tlic instaic- 
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tor, (c) degree to which students reported liking the instruc- 
tor, (d) perceptions of instructor’s fairness, and (e) reported 
likelihood of taking anotlier class from the instructor, all mea- 
sured on 6-point scales from very negative to very positive. 

Results were striking in their clarity: Procedural fairness af- 
fected students’ perceptions of the instructor and his or her 
course. Some evidence suggested that this effect was stronger 
for older students. Implications for an instructor’s behavior are 
obvious — most notably that fairness, especially in testing pro- 
cedures, is ver>^ important to students (Rodabaugh & Kravitz, 
1994). A professor who is perceived as fair in testing and fair 
in establishing classroom policies will be respected, liked, 
perceived as caring, and likely to be chosen for another class. 
Conversely, a professor who is nor perceived as fair will not 
be as w^ell respected, liked, or chosen even if he or she gives 
relatively high grades. 

These findings are quite compatible w'ilh the instructional 
and assessment principles that stress learning goals for stu- 
dents should be set intentionally, then clearly communicated 
to students at the outset of the course to maximize their moti- 
vation an learning — and that students’ learning is even more 
important than students’ liking the professor. Students should 
know' where they are aiming when they do their work in the 
course: reading, writing, and all their activities and assign- 
ments. Even when students are listening to lectures, the learn- 
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Results 
were strik- 
ing in their 
clarity: 
Procedural 
fairness af- 
fected stu- 
dents' per- 
ceptions of 
the instruc- 
tor and his 
or her 
course. 
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ing goals they perceive are their intended direction will shape 
what they hear, how they comprehend, and how they con- 
cepmalize and store the information. Assessment should 
gather information about students’ achievement of those 
goals, measured against clear, fair, and clearly communicated 
standards of quality. Comparing students’ work to a standard 
of quality— criterion referencing — is the most appropriate 
form of assessment in most classrooms. Rodabaugh and 
Kravitz’s study demonstrated that students perceive criterion 
referencing as fair and appropriate. 
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ENSURING THE QUAUTY OF CLASSROOM 
ASSESSMENT INFORMATION 



The information conveyed by grades, scores, ratings, or judg- 
ments of student assessments must be just that — infomiation. 
Scores and grades must carry real meaning and be accurate 
indicators of that meaning. Fuither, the meaning of grades and 
scores must be appropriate for the purposes to which users of 
their infomiation wi\[ put them. This section briefly defines 
the measurement principles of validity and reliability, shows 
how they apply to classroom assessment in higher education, 
and then discusses ways to enhance the validity (meaningful- 
ness and appropriateness) and reliability (accuracy) of assess- 
ment infonnation about students. It also includes practical 
suggestions about how to maximize the validity and reliability 
of classroom assessments that have been culled from a variety 
of resources (Linn & Gronlund, 1995; Nitko, 1996; Northwest 
Regional, 1994, 1998: Stiggins, 1997). Suggestions were chosen 
for their applicability' to higher education, and examples illus- 
trate uses in higher education classrooms. 

Validity 

Validity refers to the degree to which a score is meaningful 
and appropriate for its intended purpose. Validity^ refers, then, 
to W'hether and to what degree a score means whai the in- 
stmctor thinks it means, or says it means. Validity^ is a charac- 
teristic of a score put to a particular use, not a characteristic of 
a test or assessment itself. Messick (1989) distinguishes be- 
tween the purposes of interpreting and using scores. In higher 
education classrooms, almost all assessment information is 
used. Formative feedback to students intended to guide their 
studying and summative assessment intended to be part of a 
course grade are both uses in Messick’s terms. The use to 
which assessment information is put determines what kind of 
information is needed. 

For example, a very carefully constructed final exam for an 
intern'iediate French class might produce valid measures of 
students’ achievement in that class, but the same exam would 
not be a valid measure of students’ achievement in a chem- 
istry class. Although this example is obvious and a little silly, 
the same principle applies to misinterpretations of measures 
that are a lot less obvious. That same carefully constructed 
final exam for the intemiediate French class might be a less 
valid measure of achievement in a different French class 
where the instructor pursued somewhat different goals for 
learning. 
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Validity is arguably the most important quality' of a score 
from a classroom assessment, because such scores are used 
for educational decisions with consequences for students: 
deciding what content needs to be reviewed, assigning 
grades, counseling students about what courses to take in the 
future, deciding how a course might be taught differently. It 
is therefore imperative that the scores actually mean what the 
instructor thinks they mean; otherwise, unwise or unfounded 
decisions might be made. Appropriate consequences, and 
lack of inappropriate consequences, are one source of impor- 
tant evidence about the meaning of scores (Messick, 1989). 

Maximizing the validity of information obtained from 
classroom assessments means maximizing the degree to 
w'hich the scores, grades, ratings, or judgments contribute to 
making meaningful decisions about students’ achievement or 
performance of intended goals for learning. To that end, the 
instructor should check to see that the intended learning 
goals themselves are appropriate ones for the course, the 
discipline, and the students; that assessments match the 
goals; and that the assessments’ content matches the particu- 
lar intended use. Table 4 shows the general principles for 
checking thevse conditions. 

Score information is likely to match learning goals if items 
or tasks for students are clear, the content material matclies 

■ the content of learning goals for the course, and each portion 
oi' the material contributes the appropriate weight or propcM- 
tion of the total score. The level of thinking required for the 
assessment should matcii gcxils for the course, as should the 
demonstration modality, the actual task the assessment poses 
for the student. A complete and representative sample of the 
content and skills to he measured ensures '‘content validity.” 
The first way to ensure this match is through careful plan- 
ning — that is, thinking abcuit as.scssmeni at the stait of overall 
planning for the course and continuing through constructing 
the syllabus and designing individual class activities. What 
kind of information is needed? What performances by stu- 
dents will give that information? 'Fhe second way to ensure a 
match is to write te.st items or performance assc.ssinent tasks 
carefully and thoughtfully, keeping this principle in mind. It 
helps to ha\'C someone else who is familiar with the course 
material review the assessment (asks. 1^'or classroom assess- 
ments in higher education, a careful content review is the 
most important tool Ibr ensuring \ alidity. 
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TABLE 4 

Validity of Classroom Assessments 

Score information is likely to match learning goals if: 

• Items or tasks for students are clear. 

• Content material matches content of learning goals. 

• Each portion of the material contributes the appropriate weight or proportion of 
the total score. 

• The level of thinking (recall, application, analysis, synthesis, evaluation) required 
for the assessment matches the learning goal. 

• The demonstration modal it\^ matches the intended learning goal. (Was the intent to 
identify sometliing, describe something, make something, or do something?) 

• The range of po.ssible items or tasks or contexts is wide enough to represent accu- 
rately the goal for learning. 

• Items or tasks are substantively representative of the nature of intended learning 
tasks. 

Score information is likely to match the intended use for the information if: 

• For grading, items or tasks represent ail the course’s, goals for learning. 

• For instruction, information is fine grained enough to determine not just that stu- 
denis can or cannot gi\ e correct an.swers, hut what their misconceptions are. 

• For placement, scores repre.sent all the rele\'ant and nece.ssaiy background knowl- 
edge and skills. 



A “test blueprint" or table of specifications is a method that 
helps to plan the aj:)propriaLe content representation when 
writing tests and examinations. It also makes actually waiting 
the test, or selecting items from a pool of items that are al- 
ready written, easier and faster. Table S provides an example 
of a test blueprint for an exam in a sports medicine class. 
Write the content to be covered in rows and the level of 
thinking required in columns. Taxonc^riies of educational 
objeclK es or a discipline-specific set of performance modali- 
ties can be used for this purpexse. Hie important p<dnt is in 
use a way n\' elassifydng that is appropriate to the course. 

For many purposes,, especially for introductcay or survey 
courses, a simple iwo-ealegoiy designation that distinguishes 
rtrall ni infonnalion Wnm memory’ and a|:)plieati(M'i of inlor- 
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TABLE 5 

A Test Blueprint for a Cumulative Final Exam in Evaluating Injuries 



Learning goal 


Recall 


Application 


Total 


• Identify signs 
and symptoms 
of common 
pathologies 


30 


15 


45 (39%) 


• Identify anatomical 
locations 


15 


20 


35 (30%) 


• Describe special 
tests and 
evaluations 


15 


10 


25 (22%) 


• Make preliminary' 
diagnoses based 
on knowledge of 
anatomy, pathology, 
and evaluation 


0 


10 


10(9%) 


Total 


60 (529'h) 


53 (48%) 


115 (100%) 



Source: Adapted from Platt. Turney, McGlumphy, 199H. 



malion or problem solving would be as useful as a more 
complicated classification system. 

The second step is to indicate, in the cells created by the 
rows and columns, the number of points that should be 
allocated for that particular content and level of thinking. 

For objective test items, one point usually means one item. 
For essay and partial credit items, the number of items will 
vary\ If, for example, 10 points should be allocated to a cell, 
it could be one 10-point rest item, two 5-point items, one 4- 
point and one 6-point item, and so on. Cells should be left 
blank Cor given a zero) if it is not necessary to test that con- 
tent at that level. Thus, totals and percents of the whole for 
each row and column alkw a quick check for the distribu- 
tion of score meaning. It is easy to see whether some con- 
cent or ty^pe of thinking has been given too much weight, or 
too little, and adjust the blueprint accordingly before the 
instructor has spent a lot of time writing test questions. 

This scheme should not get too complicated. Too many 
cells with too few pointvS per cell suggests a microengineer- 
ing of scores that is more precise than most classroom as- 
.sc.ssinents will bear. A test blueprint should not be time 
consuming. It should function as a way to sketch out the 
plans for a test before writing it, ensuring appropriate cover- 






age and saving time in the long run. It is much easier for an 
instructor to write five good recall items about atomic struc- 
ture or write 10 points worth of application questions about 
global warming than to start with a blank page and the task 
'‘write an exam.” 

A score is likely to represent the content or performance 
domain if enough items or tasks are included so that the in- 
structor can be confident performance level is not just a 
fluke or chance event, if the range of possible items or tasks 
or contexts is wide enough, and if items or tasks are sub- 
stantively representative of the nature of intended learning 
tasks. Test blueprints help plan content coverage. For exam- 
ple, suppose an exam were supposed to cover the eight 
assigned readings for a section of a course in 20th century^ 
American novels but included questions about only two of 
the novels. If that section of the course were assessed only 
with that exam, then the portion of the grade that test was 
worth would treat that score as if it assessed knowledge of 
all five novels, no matter what the instructor had meant by 
the exam. Would the instructor be sure the students under- 
stood and could discuss all five novels? How much would 
confidence rise about the representativeness of the exam 
score if three novels were included? 

Simple content representativeness is one issue in using 
test items or performance tasks to represent learning goals. 
Another issue is the representative nature of the task. Do the 
tasks students are asked to perform represent the kind of 
tasks the instructor had in mind for learning goals in the 
course? If the instructor's intent is that students should be 
able to discuss the use of Shakespearean images and refer- 
ences in contemporary^ American literature, what is the most 
representative kind of task to ask students to do? A paper 
requiring them to look up references and cross-references 
and deal with them in depth comes much closer than an 
essay test question where all the references must be re- 
called, because “discussion” will be limited to the size of the 
student's short-term memory’. 

Sketching out a course-level blueprint is a good idea for 
grading to make .sure that the “score" that is the grade for 
the course contains the right proportions of different indexe.s 
from different assignments, projects, or exams for each of 
the course’s intended learning outcomes Sometimes a sim- 
ple list-sryie blueprint will suffice for planning propoiricmal 



representation of components in course grades. In assigning 
course grades, it is much more likely that the grade will 
match its intended use — namely, to indicate the achievement 
level for the whole course — if assessments represent all the 
learning goals for the course. 

To be useful for instructional decisions — for example, de- 
ciding which goals to spend more or less time on — scores 
must give detailed information. One overall score, for exam- 
ple, on a final exam or paper is enough to use for grading 
purposes, but if scores on a midcourse assessment are to be 
used to influence future instmction, the scores must be on 
individual aspects of the goals. Did the saidents have diffi- 
culty with finding the material, with comprehending it and 
reasoning about it once they found it, or with the writing of 
the paper? Each aspect has different implications for w^hat to 
reteach or review. One o\ erall score indicating all three 
would not give the instructor this information. Or, on a mid- 
term exam, it might be helpful to give separate scores for 
separate groups of knowledge or skills so that students 
know their strengths and weaknesses and can study accord- 
ingly, and so the instmctor can aim instmction to hit the 
weaknesses harder than the strengths. 

Reliability 

Reliability refe^’S to the degree to which a score is consistent, 
across time or judges or forms of assessment. With regard to 
consistency across time, it should not matter wdiether a final 
exam is scheduled on a Wednesday morning or a Thursday 
afternoon of finals week; given the same level of study and 
preparation, students should expect to receive the same 
exam grade no matter which day the)’ take the exam. With 
regard to consistency across judges, for example, when a 
particular essay is .scored in four sections of a freshman 
composition course with the same learning goals, one in- 
structor should not haN'e "easier" or ’harder" standards for 
marking than another. And with regard to consistency across 
forms, consider the case of a student who takes a makeup 
exam. If the exam is to give the same assessment informa- 
tion as the original test, say to cover the same set of learning 
objectives and count as the same percentage of the course 
grade, then it should allow the student to score approxi- 
mately what he or she would have scored had the student 
taken tlie regular exam with the rest of the class. 
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For “mental measurements” — measurements of achieve- 
ment or other inside-the-head constructs — accuracy and con- 
sistency are confounded in a way that they are not for physi- 
cal measurements. If a bathroom scale always reads exactly 2 
pounds light, it is consistent but not accurate. It is possible to 
know that die scale is 2 pounds light, however, only because 
external and independent measures exist of w^hat a “pound” 
is, against which the scale can be evaluated. No such exter- 
nal measures exist for learning goals in classes. The only 
available information about those goals comes from the mea- 
suring tools, the exams and assignments and projects, used 
to measure achievement in the course. So consistency and 
accuracy are completely confounded; the best available infor- 
mation about a student’s "taie” level of performance comes 
from measurements of performance that, at whatever level, 
are at least stable for the student. 

This confounding of consistency and accuracy places a 
special burden on classroom assessments. In a physics lab, it 
is possible to measure an object 10 times, take the a^'erage of 
the measurements, and use the result as a reliable estimate of 
the object s true measure. For standardized assessments, it is 
possible at least some of the time to recaiit subjects who 
take the same assessment twice to examine how consistent 
performance is. But it is not possible to do so for classroom 
assessments. Once an assessment is given, it is “over," and 
even if an instructor could persuade a few students to take it 
again (an unlikely event), the students would know’ w’hat the 
questions w^ere and would prepare for them, making it not 
the same assessment. As common checks for consistency are 
not available for measures of achievement in the classroom, 
instaictors must take particular care to consider reliability 
w'hen they design, write, and score classroom assessments. 

Reliability and validity are related. Sirriple consistency, for 
its own sake, is not particularly helpful. A measure can be 
consistent but be the wrong measure for the job, as in the 
cavSe of the French exam given in chemistr>' class that w-oiild 
reliably indicate, over and c ver, that the students did not 
know much French. But a score cannot be any more valid 
than it is reliable. If assessment results cannot be counted on 
to do a pretty good job of estimating students’ real le\'cls of 
acliievement, tliey cannot cany much meaning. A vcty unre- 
liable score, of any learning goal, might ju.st as w^ell repre- 
sent a chance draw ing from a fislibowl as performance on 
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an assessment. The more reliable the score, the more it 
makes sense to ask ‘'reliable indicator of what?” — and the 
“what” leads back to validity, the main concern. Viewed in 
this manner, the value of ensuring reliability in classroom 
measures lies in its contribution to validity, to maximizing 
the potential of information to be truly representative of a 
student’s performance and thus useful to instaictors and 
their students. Table 6 presents principles for maximizing 
the reliability of classroom assessments. 

TABLE 6 

Reliability of Classroom Assessments 

Score is likely to represent consistent, typical 
performance for students if: 

• Performance does not depend on time of day, location, or 
other external factors. 

• Performance is consistent with other of the student's similar 
work. 

• Enough items or tasks are included so that the instaictor can 
be confident performance is not ju.st a fluke or chance event. 

Score is likely to represent a typical Judgment (for 
rubrics and partial credit) if: 

• Different judges ^'ould agree on the score. 

• The halo effect and other potential biases ha\’e l:>een avoided. 

• The performance, not the .student, has been rated. 



A score is likely to represent consistent, typical perfor- 
mance for studerits if performance does not depend on time 
of day, location, or other external factors, and if perfor- 
mance is consistent w ith other of the student 's similar work, 
These requirements are not usually great problems for class* 
room assessment in higher education — except, perhaps, in 
the event a statistics exam is given in a room next to some 
construction featuring an air hammer on concrete. In that 
case, day, time, and location can make a difference! 

Checking to sec whether performance is consistent with 
other work the .student has done and with expected levels 
of performance given what the instructor knows about the 
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student can be done informally. The classic example is 
something instructors have been doing for years: comparing 
out-obclass work with in-class work. If out-of-class work is 
outstanding but in class a student does poorly, chances are 
the out-of-class work is not entirely the student’s owm. This 
kind of judgment must be made with a clear picture of the 
achievement targeted in mind. Written work done outside 
class, for example, might be expected to benefit from spell 
checkers and extra editing time. These benefits would not 
be evidence of unreliability but would rather demonstrate 
performance on a task that differs from an in-class essay. 

It is important to base expectations for students’ perfor- 
mance on similar w^ork. Individuals sometimes differ specifi- 
cally in their achievement levels on various types of tasks and 
in various subject matters. Therefore, some students' wwk on 
one kind of task, say writing, may not match their work on 
another kind of task, say recall. Or students’ achievement of 
learning goals related to the American Revolutionary War may 
differ from their achievement of learning goals related to the 
Civil War. A Civil War hobbyist, for example, might do much 
better on a Civil War assessment, and a student who had al- 
ready had a course in 18th century British history might do 
much better on the Revolutionary' War asser^sment. The whole 
point of a college course is for students to learn, and it is 
likely that students who study wdll do just that. Therefore, 
performance that goes up after instmclion, because of study- 
ing, does not indicate unreliability. 

A source of unreliable classroom assessments that is more 
likely to pose a problem than inconsistency across time is 
inconsistent judgments. Rater or judge errors are difficult to 
check in classroom assessment, because typically one in- 
structor docs all the grading for one class or section. Insiruc- 
tc^rs do not usually grade w^ork for each other's students. In 
some places, to do so might even be considered a violation 
of academic freedom. But it remains an important issue that 
the quality of the w'ork is judged according to some speci- 
fied standard and does not depend on which instaictor’s 
.section a student schedules (Mitchell, 1998). This i.ssue is not 
relevant for objectively scored tests, but it is very important 
when partial credit points or scoring schemes arc used for 
essay tests and performance assessments. For these scoring 
schemes, a .score is likely to represent a typic-al judgment if 
dilTerent judges would agree on the score and if the judg- 
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ment is not influenced by irrelevant characteristics of stu- 
dents such as gender, expressed interest, and previous work. 

Most instructors realize they must judge the work, not the 
student. But it is sometimes hard to bracket and ignore other 
information about the student w^hen reading or ohseiving a 
student’s work. One such bias is so common it has a name; 
the “halo effect” — ^the phenomenon at work wdien a good 
answ^er to one problem influences the grader to ascribe more 
merit to the answers to subsequent p.roblems than they actu- 
ally deserve. The usual advice to avoid this problem is to 
grade all answ^ers to the same problem at once, thus making 
implicit comparisons from one answer to the next on the 
same paper impossible 

Grading all the answers to one problem together has 
another benefit for reliability as w^ell. The instructor will get 
used to the criteria and scoring scheme for that problem and 
be more likely to apply them uniformly to everyone’s w^ork 
if the same problem is graded over and over. It is mentally a 
much more difficult chore for the grader to use one set of 
criteria for one problem and then change criteria to grade 
each subsequent question than it is to repeat the process 
over again for each student’s test, 

A final but very iinpoitant method for maximizing reliabil- 
ity of the rater is to w^ork on the clarity of descriptions in 
scoring schemes. If each criterion and each point level within 
it has a clear description, the rater’s task of recognizing that 
quality of work when he or she sees it is simplified. Clearly 
defined categories mean less room to guess about which 
category a work sample fits. When scoring schemes are clear, 
raters are likely to make consistent judgments themselves, 
from paper to paper, and they would be likely to agree with 
other raters if more than one person judges the same work. 

Summaiy 

“Validity” and “reliability” are used tc'> de.scribe the meaningful- 
ness, appropriateness, and accuracy of .scores and grades for 
their intended purposes. These ciiaracteristics of quality arc of 
vital impcjrtance because without them, decisions will he 
based on misinformation. And insiiiictors who work lo ensure 
the validity and reliability of their classroom asse.ssment infor- 
mation will realize another benefit beyond sound information 
about .students’ achievements. Instiuclors who have C(mfi- 
dence in their infc'irmation about students aciiievements 
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speak more confidently with students about their work, offer 
more helpful suggestions for improvement, and feel better 
about the effects of their instruction and their control of an 
effective means for monitoring and adjusting that instruction. 
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OPTIONS FOR CLASSROOM ASSESSMENT 

A Framework for Understanding Assessment Options 

Student assesvsment should be multidimensional and the fo- 
cus of ongoing communication with students about their 
achievement of objectives for the course. The methodology 
for classroom assessment can be thought of as a toolkit that 
faculty members use for accomplishing their purposes. Stu- 
dents’ involvement in assessment, at all stages of the process 
from design through scoring, is also recommended as a strat- 
cgy for teaching and learning and for enhancing motivation. 

Several \^ersions of a framework for understanding types 
of classroom assessment have been offered (see Stiggins, 

1992 , 1997 , paiticularly 1992T Assessment methods can be 
grouped into three general categories: paper-and-pencil tests, 
performance assessments of processes or products, and oral 
communication. For each category, objective (right/wrong or 
present/ahsent) and subjective (judgment of degree of qual- 
ity) scoring can be developed. Objective scoring is easier to 
do than subjective rating, but objectively scored questions 
are more difficult to write well than are subjectively scored 
questions and exercises. Depending on the author one con- 
sults, portfolios can be considered a fourth category of as- 
sessment or a different son of beast that falls between the 
cracks: part assessment method and part collection and com- 
munication of assessment re.sults. 

Different assessments are necessary to cover the full range 
of achievement targeted: knowledge, thinking, processes, 
products, and dispo.sitions. Table 7 describes and gives .some 
examples of the various kinds of as.sc.ssments that can he 
used to evaluate achievement of learning goals. 

Paper-? nd-Pencil Tests 

College imstructors are generally familiar with both ohjcc- 
lively and subjccti\’ely scored tests. Test development should 
he keyed to the learning objectives for the course. It should 
he obvious to the student that what has been stre.sscd in the 
course and what is valued knowledge arc the focus of the 
exercises the .students are asked to do. If, for example, a 
course had stressed interpreting poetty but a large porlicMi of 
the final examination includes identifying poets, dates, and 
titles of pc^ems, then scores on the final would not reflect 
what the instructor intended the students to learn, nor 
would they reflect what students thought they were siip- 
{'Kxsed to learn. 
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TABLE 7 

Classroom Assessment Options 




Source Aclaplcd from 1987, 1992, 1997. 




To design a test, the learning targets must first be identi- 
fied and then assessed to decide whether they represent 
knowledge, thinking, skills, products, or dispositions. Knowl- 
edge and thinking are usually captured well by well-written 
tests, but if the target includes skills or products, a test will 
be only a prox^^ for complete assessment. Two steps are nec- 
essary to make sure the test really taps into students’ knowl- 
edge or thinking and not something else. The first is to de- 
sign the general form of the exam, giving space and weight 
to various topics as appropriate to the instructional intent. A 
test blueprint can help accomplish this aim. The second step 
is to write clear, unambiguous test questions. vStudents can 
help with this step, but instructors who ask students to write 
questions should make sure that final, edited items are well 
WTitten, according to the guidelines in this subsection, and 
that the final set of items used for a test matches its blueprint. 

Objective test items 

Table 8 presents some general guidelines for writing objective 
test items. The purpose of these dos and don’ts is an impor- 
tant one that contributes to the validity of the information in- 
structors will get from students' performance on the test. If a 
test item is written in such a way as to tap into general logic 
or cleverness, then a student’s score will reflect general ability 
as well as the particular knowledge or application that the 
instiuctor meant to teach. General cleverness is not a bad 
quality, but it is not the basis on w*hich a student's work in a 
course should be judged. A poorly written test item also in- 
creases the risk that students who know how to answer the 
question will get it wrong, which will cause the test score to 
reflect less achievement of whatever the course was designed 
to teach than is actually the case. 

Each "do" and "don’t" has a reas(m behind it. For example, 
the suggestion to put matching and multiple-choice answers 
in logical order, if there is one, is to save students who know' 
the an.swer some reading and processing time they should he 
spending on the substance of the lest material. If the question 
is "In what year was the Battle of Hastings fought?” some .stu- 
dents may need to look over a list of choices and dc'cide 
among them. But for some students, answ'ering this questiem is 
really a matter of saying to themselves, "Where did she jiut 
1066?" If the dates are listed in order, it is easier to answer 
such a ciueslion, and the student can answer the c|uesiion 
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TABLES 

Dos and Don’ts for Writing Objective Test Items 




Writing 
good test 
items is a 
skill that 
requires 
practice, 
drafting, 
editing, and 
all the other 
elements 
of good 
writing in 
any format 



General 

1. Use clear and concise language. 

2. Prepare a draft and edit it. 

3. Proofread the draft from a student's point of view. 

4. Test important ideas, not trivial points. 

5. Write short, clear directions for ^7// sections of the rest. 

6. Don’t copy statements from the textbook. 

True/false items 

1. Make statements definitely true or definitely false. 

NOT: The advent of the computer is the strongest force for 
.social change in the 20th centuiy. 

BF/ITER: S(Mne authors have compared the social impact of 
the advent of the computer with that of the printing press. 

2. Keep statements short. 

3. Have only one idea per statement. 

NOT: Captain Altab was not afraid of death, whereas 
Ishmael wanted veiy n^iuch to live. 

RR1TER: Captain Ahal^ was not afraid of death. 

4 . Use positive statements; if the statement contains a “not," 
highlight it. 

NOT: The issue of the Emancipation Proclamation in 1863 
did not result in immediate freedom for any slaves. 

BETl'ER: The issue of the Emancipation Proclamation in 
1863 did NOT ire.sult in immediate freedom for any slaves. 

s. Make ‘•trues" and “falses" about the same length. 

6. A\'oid patterns ni' answers (e,g-, 'ITFF or TUri'). 



quickly and mo\e on, saving his or her serious thinking for 
more important paits of the te.st. Alphabetical order works well 
for lists of names or places. If no logical order is apparent, or 
if putting the answers in order w'ould give clues to the an- 
swers of other items on the test, then the chc^ices shc‘)uld he 
scrambled for the same reason — to have students’ scores be as 
accurate a rcpresentatic)n as po.ssible of what they really know . 

Writing good test items is a skill that requires practice, draft- 
ing. editing, and all the other elements of good writing in any 
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Matching items 

1. Number the items in the first column; letter the response choices in the second column. 

2. Make items and response choices homogeneous. 



NOT; Match the word with its definition. 

1. Solid bodies bounded by planar surfaces a. Absolute zero 

2. At a constant temperature, the volume of a given amount of gas b. Boyle s law 

varies inversely with pressure. c. Cr^^stal 

3- Temperature at which the kinetic energy' of molecules is zero cl. Enthalpy of 

4. Process of passing from solid to gas without going through the liquid fusion 

state, or vice versa e. Ionic radii 

5. Heat required to melt 1 mole of a substance f. Sublimation 



BETTER; Match each gas law with the name of the scientist associated with it. 

1. The volume of a cenain mass of gas is inversely proponional to the Avogadro 

pressure, at constant temperature. b .Boyle 

2. The total pressure in a mixture of gases is the sum of the individual ^ Charles 

partial pressures. ci. Dalton 



3. The rates of diffusion of two gases are inversely proportional to the Graham 

square roots of their densities. Kelvin 

4. Equal numbers of molecules are contained in equal volumes of 
different gases if the temperature and pressure are the same. 

5. The volume of a given mass of gas is directly proportional to the abso- 
lute temperature, at constant pressure. 

3. Each response choice should look like a plausible answ'er for any item in the set. If not. the list 
is not similar enough to be a set of matching items. 

4. Keep the lists short (3 to 10 items). 

5. Separate longer lisi.s into r^'o or more shorter ones, using the principle of homogeneity. 

6. Avoid having the same number of items and response choices so that the last answer is not 
really a choice. 

7. Put the longer phrases in the left column and the shorter phrases in the right column. 

8. Arrange response choices in a logical order, if there is one. 

9. Avoid using incomplete .sentences as items. 

10. Keep all items and response choices in a .set on the same page of the te.st. 

Completior^/fill-tn-the blank items 

1. Don't put t(^o many blanks together. 

NOT; The left over issues of 

BETTER: The Puritans left England over issues of . 

2. Make the an.swcr a single word if possible. 

3- Make sure there is cmly one way to interpret the blank. 

NOT: Abraham Lincoln was born in . (A log cabin'' Poverty-? Kentucky? 1809? A bed?) 

HE'ITER; Abraham Lincoln was born in the year . 

OR: In what year was Abraham Lincoln born? 

4 . A word bank (a .set of clu)ices in a box or list) i.s often helpful, depending on whether total 
recall is important or not and whether ,spelling counts. 



over issues of 
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TABLE 8 (continued) 



Multiple-choice items 

1. The stem (the numbered section) should ask or imply a question. 

2. If the stem is an incomplete sentence, the alternatives should be at the end and should be 
the answer to an implied question. 

3. If "not” is used, underline it. 

4. Avoid statements of opinion. 

5. Don't link m-o items together so that getting the second one correct depends on getting the 
first one correct. 

NOT: 1. What is the next number in the scries 1, 5, 13. 29, ... ? 

a. 43 b. 57 c, 6l d. 64 

2. 'Wliai is the following number in the series in que.stion 

a. 122 b. 125 c. 127 d. 129 

BETTER; 1. What is the next number in the scries 1, 5, 13, 29. . . . ? 

a. 43 b. 57 c. 61 d. 64 
2. What is the next number in the series 1, 4, l6, 64. ... ? 
a. 128 b. 256 c. 372 d, 448 

6. Don't give av/ay the answer to one item with information or clues in another item. 

T. Use three to five functional alternatives (response choices). Silly alternatives (e.g., "Mickey 
Mou.se”) do not draw serious consideration and should not be used. To inject humor into a 
lest, use a whole silly item, not part of a serious one. 

8. .Ml alternatives should be plausible answers for those who are tnily guessing. 

9. Repeated words go in the stem, not the alternatives. 

NOT; Computer-based tutorials are called "adaptive” if they change based on information 

a. about the .student. 

b. about the content material. 

c. about the computer, 

BEITER: Computer-based tutorials are called "adaptive ' if they change based on informa- 
tion about the 

a. saident. 

b. content material. 

c. computer. 

10. Punctuate all alternatives correctly, given the stem. 

.11. Put the alternatives in logical order, if there is one. 

12. Avoid overlapping alternatives. 

NOT; Which of the following possibilities enabling communication o\*er the Internet is the 
best choice for a class discu.ssion in a distance learning course? 

a. E-mail 

b. Usenet news 

c. Chat sy.siems 

d. Conferencing software 

BETTT^R: Which of the following po.ssibi lilies enabling commimicaiion over the Internet i.s 
the be.st choice for a cla.ss cli.scii.s.sion in a di.stance learning course? 
a. E-mail 
b \ 'senet news 
c. Chat systems 

13. Avoid "all of the above' as an aliernauve. 

14. C’se "none of the above" sparingly 

15. Adjust the difficulty c>f an item by making the alternalhos more or less alike. The more 
similar the alternatives, the more difficult the item. 

\oto. I'or more detail, see I.inn Gronlund, 199S; Nitko. 1996: Ory' & Ryan. 1993. 
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format. Writing Linambigucxis test items is a more understand^ 
able task after an instructor lias studied tlie reasons behind 
each suggestion. (See the resources described in ‘'Conclusions 
and Funher Resources for Faculry^”) 

Essays and partial-credit problems 

Assessing thinking and problem solving is a good use of the 
time and effon it takes to read and score essay tests or show'- 
the-work and partial-credit problems in matli or science. To 
really as.sess thinking, and not merely recall, the question 
must present a new problem to the student, one that he or 
she has not seen before. The question does not have to be 
truly )icu\ just new' to the student. As described earlier, even 
the most complex reasoning question becomes a matter of 
recall if the textbook or class discussion has already laid out 
the reasoning for students. 

This approach will sound harsh to iastructors who are used 
tc”> hearing complaints about exams: “We never went over that 
in class." Tlie w'ay around this complaint is to make sure that 
da.ss time includes work on new problems, students' analysis 
of issues, and the like, so that students understand why new 
thinking is iinpoitant and called for, and learn liovv to do it. 

The soluticm is ^;o/to preview' oveiylhing <m a test; oihcrw'ise. 
no higher order thinking can be demonstrated. Table 9 pre- 
sents some suggestions for writing essay questions. 

Only the instaictor of the class can detennine what is new 
and what is not. Consider the example of a freshman English 
class that is reading the Declaration of Independence. An essay 
ciLiestion about the staicture and persuasiveness of Jefferson's 
argument could require thorough, original thought — or not! 

Suppose a whole class period had been de^'oted to discussing 
“the structure and persuasiveness of Jefferson’s argument." 

Then this question w'ouki tap .students’ recall of the day’s 
discu.ssion. 

1his point about novelty of problem and level of tltought 
reejuired from the student shows clearly that ass<.‘s.sment uul 
instaiction are related enterprises. Many authors have \n 
Icmicized that a.s.scssment and instniclion should be rela 
and ha\ e demon.strated ways to do it well. Hut whether an 
instructor realizes or intends it or not, assessment and in- 
.siruciion will be related, because both are experiences the 
.student has w ith material he or she is su^^pcxsecl (o learn. 

Therefew, it is impc'nlant for instructors to unclcnstand the 
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TABLE 9 

Dos and Don’ts for Writing Essay Test Items 



Restricted range essay items (usually one to three paragraphs 
per answer) 

1. For most purposes, use several restricted essays rather than 
one extended essay. 

2. Ask for a focused response to one point; state the question 
so the student can tell what kind of response is required. 

3. Do not ask a question that requires merely extended recall. 
Questions should require some critical tiiinking; for example: 

• explain causes and effects 

• identify assumptions 

• draw valid conclusions 

• present relevant arguments 

• state and defend a position 

• explain a procedure 

• describe limitations 

• apply a principle 

• compare imd contrast ideas. 

4. Use clear scoring criteria. 

5. Don't use optional questions. 

Extended range essay items (an.s\ver will loe a true "essay" 
form) 

1. Use to test in-depth understanding of a small range of 
content. 

2. Call for students to expre.ss ideas in an c:>rganized fashion. 
Specify both what should be discussed and how it should 
be discussed. 

3. Allow enough time for students to think and write. 

4. Assign the essay as a paper or theme if out-of-da.ss lime is 
needed or if students’ choice and resources are required. 



For more detail, see Linn & Gronliind, 199s; Niiko. 1996. 



nature of this relationship. Writing better e.ssay questions 
and problems will be only one of many good results from 
this understanding. Consider the following scenario: 

7?7/,v fs a true story. A colleague of oim teaches an in- 
troductory’ calculus section. Early one term, he and bis 
class ivere working through so7yie stayidard motUni 
problems: "A buy drops a water balloon from a win- 
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douK If it takes 0,8 seconds to strike his erstwhile friend, 
who is 5 feet tall, how high is the window?" Cni the 
exam, the problem took this form: ''Someone walking 
alo72g the edge of a pit accidentally kicks into it a small 
stone, which falls to the bottom in 2.3 seconds. How 
deep is the pit?" One student was visibly upset. The ques- 
tion was not fair, she protested. Tfoe instructor had 
promised that there would not be any material on the 
exam that they had not gone over in class. ''But we did 
a dozen of those problems in class , " our colleague said. 

"Oh no, " shot back the stude? 2 t, “we never did a smgle 
pit problem." (McCh/mex Knoles, 1992, p. 33) 

ThLs illustration is used to introduce a discussion about how 
“inauthentic" assessment leads students to develop problem- 
solving strategies that help them pass exams but do not help 
tiiem reach die intended learning goals. The discussion goes 
on to describe two categories of these maladaptive student 
responses, “shapes” and “clumps,” which are discussed later. 

The student who had trouble with die “pit problem” had not 
availed herself of a problem-solving strategy that the instructor 
had recommended, namely, drawing die problem (McClymer 
tk Knoles, 1992, p. 34). Thus, her inability^ to solve this prob- 
lem, and the low score that would result, would be an accurate 
and meaninghil (reliable and valid) reflection of her learning 
about motion prc^blems. The “inauthenticity” or contrived na- 
ture of the problem was not the only reason for her failure. 

View^ed from the perspective of developing concepts, the 
“pit problem” shows how true concept development and ap- 
plication skills require variation in instaiction, not just in 
novel problems on exams, and how the tw^o are related. 
When students learn concepts, they are learning a set of de- 
fining characteristics. The characteristics that are important 
in a concept’s definition are called “essential attributes.” 
Characteristics that just happen to be there, but are not rele- 
vant to the concept’s definition, are called “nonessential at- 
trihutes.” The best way to teach a concept to learners who 
are not familiar with it is to present the best examples, plus 
some counterexamples, and include variation on all attri- 
butes that might plausibly be confused with the definition. 

I'or example, an essential cliaracteristic of a simile is that a 
v.'omparison of two like things is explicitly stated, commonly 
with “like" or “as," and an essential characteristic of a meta- 
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phor is that the comparison is implied. Metaphor and simile 
are often taught together, so that the similarity (both are com- 
parisons of two things that are alike in some way) and differ- 
ence (one comparison is explicit, the otlier implicit) are easy 
to point out. This approach helps with development of the 
ccjncept. A variety of examples are needed, too, so that sai- 
dents can learn wiiich attributes are essential. All the examples 
should not be about flowers, or even always about concrete 
things, lest students get the mistaken idea or misconception 
that metaphors and similes have to compare things to con- 
crete objects. Similarly, all the examples should not be from 
poems, lest students get the misconception that comparisons 
have to be in poetry to be called metapltors and similes. 

During instruction, then, the series of motion problems 
should not all have been dropping-from-window problems. 
The collection of “dropping problems” could have included 
various settings, buildings, cliffs, holes, scaffolds, and so on, 
and the students could have been asked what they all had in 
common, forcing students to articulate what their working 
understanding of the concept w^as in time for further expla- 
nation if misconceptions were apparent. Then students’ task 
w'ould have been to recognize the novel “pit problem” on 
the test as one of the “dropping problems” they had learned 
how to solve. One suspects, however, from the auth(')is’ ac- 
count and the instructor's suggestion that students draw the 
problems, that attention had already been paid to essential 
and nonessential attributes in that particular calculus c lass. 
The student w'ho protested may have been demonstrating 
that she did not, in fact, understand the concept. If that is 
the case, then a Ic^w score on that problem w'ould correctly 
indicate her lack of understanding. 

\Xlien instructors do not attend well to concept develop- 
ment and make sure that examples and counterexamples are 
dear for students, students wall sometimes, understandably, 
attune to aspects of the format of problems or arguments. 
After all, in their essays or solutions, students will be trying 
to c'onvince the instructor w ho is grading their work that 
they deser\’C high scores. A well-constructed test w ill not 
c'ompensate for a lack of c'onccpt de\'elopment in instRic- 
lion. If instmetion has been appropriate, however, a well- 
constnu'ted lest and a carefully prepared scoring scheme 
can minimize cases when students score well because they 
were skilled at appcarhigU) understand. 



Two different ways that students present responses demon- 
strate a less tlian deep and critical understanding of concepts, 
problems, or arguments — “clumps” and “shapes” (McClymer & 

Knoles, 1992, pp. 38-3 9"^. Clumps are parts of arguments, solu- 
tions to problems, or critical analyses that pile up analytical 
elements without the logic and explanations that should hold 
them together. Three kinds of clumps include “data packing,” 
in which students write many facts but do not make clear 
what the facts are intended to show; “jargon packing,” in 
which students use much terminology to make it sound as 
tiiough they understand but do not, in fact, explain anything 
with the terms; and “assertion packing,” in which students 
report that something is the case (for example, that the poet 
John Donne used imagery) without identifying, explaining, or 
citing any actual evidence. Someiimes several of these clumps 
are used in the same essay or analysis. 

Shapes are approximations of the logic < >f criticism, the 
form without the substance (or the words without the mu- 
sic), Students might copy the form of someone else’s analy- 
sis, perhaps from a textbook, or uncritically apply an algo- 
rithm that looks as though it should fit without understanding 
why it is appropriate. Students may stick to a surface recita- 
tion of a piece of literature or textbook, demonstrating that 
they have read the material but not that they have used it to 
think critically. A third shape students sometimes use is a 
one-note analysis, describing part but not all of the meaning 
required to answer the question fully. 

xMcClyrner and Knoles (1992) interpreted students’ ten- 
dencies to answer higher level questions with clumps and 
shapes in light of some sur\^ey data collected at their institu- 
tion. Entering freshmen reported that their main responsibili- 
tie IS students v/ere to master informational content and 
acquire critical skills. They did not express much suppoit for 
the purposes of becoming scholars in their own right. Fresh- 
men are known to have this expert-oriented, content-based 
understanding of learning; eventually more personal owmer- 
.ship of knowledge and thinking abilities becomes impoitant 
for those who make progress through the stages of adult in- 
tellectual development (Perry, 19'70). There is some truth to 
the claim, however, that .students in higher education will be 
interested in passing courses as well as in learning, as stu- 
dents have invested time and money in enrolling in courses. 

Good instaiction, coupled with high CjualiP/ as.sessmeni, can 
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help them do both. Clear scoring plans can help instructors 
judge the quality of the range of students’ responses, from 
incorrect, to clumps, shapes, and other approximations, to 
appropriate, dear, well-reasoned answers. 

Scoring essays and partial-credit problems may be done 
by a point method or a rubric method. If the problem or 
essay is one that recfuires a discrete amount of information 
and one solution or organizational strategy, it is easy to set 
up a system assigning points for each aspect, with a total 
number of points for a complete, correct answer. The in- 
stnictor must make sure that the points reflect the relative 
importance of each aspect’s contribution to the whole. Some 
aspects that are more important than others may be worth 
more points. Moreover, total points for the essay must reflect 
that essay question’s contribution to the whole test. If it does 
not, the question should be w-eighted accordingly. For ex- 
ample, if an essay should be worth 20 points on a test but it 
has only 10 logical points, the score for the essay should be 
doubled and then added to the rest of the score for the test. 

Rubrics are descriptive rating scales that are particularly 
useful for scoring when judgment about the quality of an 
answer is required — when, for example, it is not so much 
that the student remembered all the right concepts and orga- 
nized them correctly, but more that the essay was well con- 
ceived, strongly argued, included original perspectives, or 
the like. Rubrics are good at indicating a range of poor qual- 
ity through excellent quality work, making them ideal for 
scoring many college-level e.ssay question.s. show-the-work 
probleins. and performance assessments. 

To write rubrics, begin with a description of the criteria 
for good work. Envision a well-constmcted, complete, and 
clear answer to the question or problem. What w^ould its 
important characteristics be? They should be directly related 
to the knowledge, critical thinking, or skills that the instme- 
tor intended the students to acquire, and therefore they 
ought to be related to activities and instruction for the 
('ourse, what students read and studied, and therefore what 
suidenis will expect the test to require. 

Li.st the criteria for good w'ork, then describe levels of per- 
formance. Be careful to use descriptions (e.g., '‘grammar ancl 
usage errors are rare and do not interfere with meaning”) 
rather than judgments (e.g., “good”). Use as many le\’cls as 
there are meaningful di.shnaions. Meaning is more important 
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than ha\’ing some particular number of levels. The term “ana- 
l\aie rubrics” is used when each criterion has a separate scale 
and the essay is rated on each separately, with the several 
ratings summed or averaged for a total score. Table 10 pre- 
sents an example of anal>l;ic rubrics. The term “holistic ai- 
brics” is used when all criteria are considered together on one 
descriptive rating scale and the answer is scored at the level 
that best describes it. Table 11 presents an example of holistic 
ru^ rics using the same criteda. Rubrics can be shared with 
students ahead of time so they better understand what they 
are being asked to do and what counts as good work. 

Tw''o good ways to share rubrics also enable students’ in- 
\^olvement in assessment, which in turn enhances motivation 
and learning. When instructors share the rubrics, they can 
also gh^e students some examples of work done at various 

TABLE 10 

Analytic Rubrics for a Question on an Essay Test 

Thesis and organization 

4 — Thesis is defensible and stated explicitly; appropriate facts and concepts are used 
in a logical manner to support the argument. 

3 — ^Thesis is defensible and stared explicitly; appropriate facts and concepts are used 
in a logical manner to support the argument, although support may be ihin in 
places and/or logic may not be made clear. 

2 — Thesis is not clearly stated; some attempt at support is made. 

1 — No thesis or indefensible thesis; support is missing or illogical. 

Content knowledge 

4 — All relevant facts and concepts included; all accurate. 

3 — All or most relevant facts and concepts included; inaccuracies are minor. 

2 — Some relevant facts and concepts included; some inaccuracies. 

1 — No facts or concepts included, or irrelevant facts and concepts included. 

Writing style and mechanics 

4 — Writing is clear and smooth. Word choice and style are appropriate for the topic. 
No errors in grammar or usage. 

3 — Writing is generally clear. Word choice and style are appropriate for the topic. '• 
Few errors in grammar or usage, and they do not interfere with meaning. 

2 — Writing is not clear. Style is poor. Some errors in grammar and usage interfere 
with meaning. 

1 — Writing is not clear. Siyie is poor. Many errors in grammar and usage. 





TABLE 11 

Holistic Rubrics for a Question on an Essay Test 



4 — ^Thesis is defensible and stated explicitly; appropriate- facts 
and concepts are used in a logical manner to support the 
argument. All relevant facts and concepts included; all accu- 
rate. Writing is clear and smooth. Word choice and style are 
appropriate for the topic. No errors in grammar or usage. 

3 — ^Thesis is defensible and stated explicitly; appropriate facts and 
concepts are used in a logical manner to support the argu- 
ment, although support may be thin in places and/or logic 
may not be made clear. All or most relevant facts and concepts 
included; inaccuracies are minor. Writing is generally clear. 
Word choice and style are appropriate for the topic. Few errors 
in grammar or usage, and they do not interfere with meaning. 

2 — ^Thesis is not clearly stated; some attempt at support is made. 
All or most relevant facts and concepts included; inaccura- 
cies are minor. Writing is not clear. Style is poor. Some errors 
in grammar and usage interfere with meaning. 

1 — No thesis or indefensible thesis; support is .missing or illogi- 
cal. No facts or concepts included or irrelevant facts and 
concepts included. Writing is not clear. Style is poor. Many 
errors in grammar and usage. 



levels. Ask students to rate the examples and tell the>^ 
scored them as they did. If descriptions of levels arc well 
written, saidents wall have to articulate the qualities of the 
w^ork to justify their ratings. A \^ariation on this strategy is to 
share the examples of w’ork with students and have the stu- 
dents reason inductively from the examples to write the 
aibrics themselves. This approach works well and helps stu- 
dents develop a sense of ownership of the criteria for good 
w'ork. Students are also likely to internal^^e and remember 
these criteria for good work if they develop them themselves. 
The cost for this instructional benefit is time; therefore, it is 
wiser to use this strategy on very' important and authentic 
work than on simpler, more contrived classroom assignments. 

Performance Assessment 

Performance asse.ssment refers to assessment in which a 
student's product or participation in a prt'icess is ol'i.serwd 
and judged. Performance assessments have two paits: a task 
and a scoring scheme. One without the other constitutes an 
incomplete performance assessment. 
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Performance tasks differ from essay tests in that they usu- 
ally require sustained performance, often allow the use of 
resources, and often encourage revision and refinement 
before a final product is submitted. (Suggested frameworks 
for categorizing assessments range frona multiple choice at 
one end of the continuum to presentations [Bennett, 19931 
or collections of work over time [Snow, 1993) at the other.) 

A performance task in which a student is asked to solve a 
problem and explain the solution in one or two class peri- 
ods is closer in kind to an essay question than a perfor- 
mance in which a student is asked to write a paper about a 
research question, using library and Internet resources, over 
the course of a semester 

One benefit of some performance assessment tasks is the 
requirement that students write reflections or explanations. 
Writing explanations makes students’ reasoning explicit. It is a 
no-lo$e situation: Either a student gives evidence of clear, 
logical, appropriate reasoning in a discipline or demonstrates 
where his or her reasoning is flawed, thus identifying specifi- 
cally what area needs more work. Writing reflections affords 
students the opportunity to think about what they have 
learned and what it means. “Knowing what they know" is a 
melacognitive achievement for students, an awareness of 
comprehension that is required for students to become self- 
sustaining, self-directed learners in a discipline. 

Performance assessment tasks should not be simply interest- 
ing, novel, or appealing activities chosen for their novelty or 
appeal. Performance tasks should be consStructed to elicit evi- 
dence of learning outcomes achieved — ^which requires thought 
and care. The following example shows how thought must be 
gi\^en to performance tasks to ensure that they give evidence of 
students’ achievement of specific learning targets. In the exam- 
ple, a novice teacher thinks it w'ould be a g(')od idea to make 
an assessment out of the instmctional activity of putting the 
Socrates of Plato’s Apology^on trial (Wiggins, 199H). But wTat is 
wrong with that idea, and what should be done about it? 

• Although the desired achievement involves the text and its 
implications, the activity can he done engagingly and 
effectively by each student with only limited insight into the 
entire text and its context. If a student merely has to play 
an aggrieved anstocrat or playwright, he or she can sttuly 
for that role with only a limited grasp of the text. Also, the 
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means that 
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tasks, 

whether test 
questions 
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tasks, are 
grounded 
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ally do in a 
discipline. 



student's trial performance need 7Wt have much to do with 
Greek life and philosophy. The question of assessment va- 
lidity (Does it measure what we want it to measure?) works 
differently, requiring us to consider ivhether success or 
failure at the proposed task depends on the targeted knowl- 
edge (as opposed to fortunate personal talents): [The] per- 
formance of the student playing, say, one of the lawyers 
may be better or worse relative to bis or her debating and 
lauryering skills rather than relative to his or her knoivledge 
of the text. 

• It is highly unlikely that ive will derive apt and sufficient 
evidence of understanding of the text from each individ- 
ual student thr'ough this activity, everr if we can hear an 
understanding of the text in some comments by some stu- 
dents. In fact, in the heat of a debate or mock Mai, stu- 
dents might forget or not be able to use what they under- 
stand about the text, depriving themselves and us of 
needed assessment evidence. This is a cnicial pr^oblern, 
commo}^ to many proposed assessment tasks: [Wtjen] we 
employ a panicular performance genre, such as a trial, 
essay, or repon, as a means to some other assessment end, 
we may fail to consider that the performance genre itself is 
a variable in the task and will affect the r'esults. 

• Although the trial may provide some evidence, it is far 
more likely that in this case a thorough and thoughtful 
piece of writing, combined with an extensive Socratic 
seminar on the text, would tell us rnor^e of what we need to 
know about students' knoivledge and uriderstar'iding. 

Such writing and discussion can certainly supplemerit the 
trial, but in considering such an idea we should be aleri to 
the fact that no single complex performance task is suffi- 
cient for sound assessment, (pp. 31-32) 

Assessment should be both “authentic’* and “educative" 
(Wiggins, 1998). “Authentic assessment" means that assess- 
ment tasks, whether test questions or performance assess- 
ment tasks, are grounded in the kind of work people actu- 
ally do in a discipline. That is, tests are not inappropriate, 
hut they should be used the same way that drills in athletic 
practice are, for mastery and review of component skills, 
while keeping in mind the end or goal of real work or per- 
formance in the discipline (Wiggins, 1998). And at least 
some of the time, the “game must he played," or real-world 
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performance must be attempted, to help students see what 
is required for real practice and how they measure up. “Edu- 
cative assessment” means that a primary purpose of stu- 
dents’ participation in assessment is to teach, to help stu- 
dents improve on dimensions of performance that are re- 
quired for genuine or authentic work, to help them concep- 
tualize what that work looks like (Wiggins, 1998). 

Scoring criteria for evaluating performance tasks are con- 
structed in a similar manner to the scoring schemes used for 
essay test questions. Their content should reflect the nature 
of the process or product that is to be scored. 

Criteria for evaluation can be used holistically, consider- 
ing all criteria simultaneously and assigning a single score, 
or analytically, considering each criterion separately and 
assigning a separate score for each. Analytical scoring is 
more helpful as feedback to students than holistic scoring, 
because students can see w^here their strengths and weak- 
nesses are and work on their skills accordingly. Holistic 
scoring takes less time, because one judgment is required of 
the scorer instead of many. Analytic and holistic mbrics for 
performance assessments are similar in form to mbrics for 
essays and partial-credit test problems. 

Table 12 presents a set of rubrics for scoring a Web page 
design project, a performance assessment. It is an analytic 
rubric, chosen as an example here because two of its scales, 
HTML Creation Skills and Navigation, illustrate rather concrete 
descriptions of work at each level (e.g., ‘‘at least two lists”), 
while the Web Page Layout scale illustrates more abstract qual- 
ity descriptions (e.g., “hierarchy closely follow^s meaning") tliai 
require substantive judgment. Both kinds of descriptions are 
appropriate for rubrics; the important point is that the descrip- 
tions match what genuinely reflects levels of quality. Depend- 
ing on the puqDOse of the assignment, this set of rubrics could 
include a fourth scale to evaluate die accu^ac>^ importance, or 
impact of the content included on the student’s Web page. 

Another way scoring scales can vary is with regard to 
w^hether they are generalized or task specific. Generalized 
aibrics can he used in assessment, but they also can be used 
instructionally, shared widi students to help them understand 
the nature of the achievement target. Generalized mbrics are 
one way to communicate the characteristics of good quality 
work. Saidents can use them in their work and in evaluation 
of others’ work. Task-specific scoring mbrics arc easier to 
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TABLE 12 

Performance Assessment Rubrics for a Web Page Design 



Level 1 


Level 2 


Level 3 


Level 4 


Level 5 




Web page (HTML) creation skills 




No HTML 


Text is broken 


Headings, title. 


Same as Level 3 


Same as Level 


formatting 


into paragraphs; 


tags such as 


plus images 


4 plus at least 


tags; text 


headings are 


preformatted 


and hyperlinks 


2 lists, images 


is not broken 


used; no other 


text, styles, 


to related 


as hyperlinks; 


into paragraphs. 


HTML tags. 


centering, hori- 


material. 


color or back- 






zontal lines, 




ground image, 






lists, etc. 




frames, tables, 










or imagemap. 






Web page layout 






Layout has no 


Text broken 


Headings label 


Hierarchy 


Consistent 


structure or 


into paragraphs 


sections and 


closely follows 


format; 


organization. 


and sections. 


create hierar- 


meaning; head- 


extends the 






chy; some 


ings and stydes 


information 






consistency. 


consistent 


from page to 








within pages; 


page; easy to 








text, images, 


read; attention 








and links flow 


to different 








together. 


browsers and 
their quirks. 






Navigation 






One page 


One page with 


Two pages (or 


Three or more 


Title page 




title bar added, 


one page with 


pages with 


with other 




heading, etc. 


links within 


clear order, 


pages branch- 






page or to 


labeling, and 


ing off and at 






other re- 


navigation 


least four 






sources); navi- 


between pages; 


pages total; 






gation between 


all links work. 


navigation 






pages; links 




path clear and 






work. 




logical; all 
links work. 



Source: San Diego County. 1998. Used with pcrmissif)n. 



apply reliably the first time, but they cannot be shared with 
students ahead of time, when the assignment is made, because 







they contain answers (e.g., “uses Newton’s Law of Cooling" 
instead of “selects relevant principles and procedures"). 

InstRictors who regularly involve students in their own 
assessment can use this distinction to advantage. Share gener- 
alized rubrics with students or have them develop their own 
aibrics, then present a specific task for students to work and 
then score together. In describing why a performance de- 
serves a certain score, students will articulate the specifics. In 
the example above, students would defend a score by saying, 
“They selected Newton’s Law of Cooling, which in this case 
was a relevant principle for solving the problem." 

Grading cooperative assignments presents an unusual chal- 
lenge. Performance assessment highlights this difficulty, be- 
cause most cooperative assignments are some type of perfor- 
mance assessment. Group reports, skits, or other projects can 
result in perfonnance hy a group in cases where an individual 
paper or other assessment is not really feasible. Yet college 
grades are given to individuals. Table 13 presents an example 
of a peer evaluation that group members can use. If students 
are aware .head of time of the expectations and the fact that 
they will be monitored, many problems will take care of them- 
selves. If the peer evaluations indicate that one group member 
was not cemtributing at the same level as others, and if the 
instRictor’s observations agree, then the instructor can inter- 
vene in several different w'ays, from speaking to the student or 
group to adjusting the grade. 

Oral Questions 

Oral questions during class time help both instructors and 
students to clarify what they know and where misconceptions 
have occurred. I’hey work best in small classes. The instnictor 
must know students' names to call on them, and studenLs 
must know their classmates well enough so that they do not 
perceive the questions as public grilling. Oral questions can 
be factual in nature and check for simple recall or for whether 
or not assigned reading was done. “Why did your author say 
that ITiomas Jefferson became interested in education?" “What 
is the chemical reaction that happens during nuclear fission?" 

For oral que.stions to indicate accurately what the class as a 
whole understands, it is important to sample a range of stu- 
dents. Tlie range should include vaiious abilities as well as 
various interests. Always calling on students who have their 
hands rai.sed will bia.s the infonnation gained about cla.ss mcni- 
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TABLE 13 

Sample Peer Evaluation for Cooperative Learning Assignments 





Write the names of each member of the group, including yourself, in the boxes in the first column. 
Put a check in each ceil in the grid to indicate “fine job, as expected ’ for each group member for 
each criterion. For any box in which you have reseivations about making a check because a group 
member did not meet your expectations for a criterion, write a brief comment. For any box in 
v^ hich you would like to comment on tmly exceptional performance, plea.se do .so. 

Source: .adapted from Munson. 1995. 

bens' understanding. Most of the linie, a disproportionate num- 
ber of saidents who do understand the material in a lesson 
will be represented among those who volunteer to participate. 

Handled well, oral questions pnwidc good assessment in- 
formation for instructors about students’ understanding and 
interest. This information is best interpreted for the group 
(class). The use of avssessment information about individual's 
panicipalion in class discus.sions for grading is more prob- 
lematic, as personal and group dynamics as well as availabil- 
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ity of “air time" mean such information gives an incomplete 
picture of an indivicluars understandings. 

Ask students questions that require knowledge at all cog- 
nitive levels. As for test questions, it is easier to pose recall 
questions in class than to ask questions that require applica- 
tion of principles, analysis of issues, or other complex think- 
ing. Similarly, it is easier to judge the adequacy of responses 
to recall questions than to questions requiring more thought. 
The kind of questions the instructor asks should match the 
kind of information the instmctor needs to know. Preparing 
some higher order questions ahead of time is a good strat- 
egy, as questions made up on the spot in a class are likely to 
be recall questions. Asking “why" as a follow-up to concept 
questions is also a good way to elicit thinking from students. 

Portfolios 

A portfolio is a purposeful collection of a student’s work, 
often with samples of work collected over time and with 
reflections about what was learned or what a piece is sup- 
posed to demonstrate (Alter, Spandel, & Culham, 1995; 

Nitko, 1996). Some authors consider portfolios an option for 
assessing work in their own right. Others consider portfolios 
a collection of various assessments. Portfolios constructed for 
a single course are usually designed to reflect achievement of 
the particular goals for that course. This purpose contrasts 
with the traditional artist’s portfolio, which is designed to 
show best work in a field and may represent work clone in 
several courses and outside courses. Some students, most 
notably in the fine arts, will develop such portfolios during 
their college careers, but they are not the focus of this discus- 
sion. The following review is limited to portfolios used to 
demonstrate achievement of learning goals for a course. 

The most important point to decide when a portfolio is se- 
lected as an assessment tool is its purpose. What learning or 
accomplishments is it intended to show? And is it intended to 
illustrate progress, the process on the journey toward achieve- 
ment, or just final products? A portfolio for a writing class, for 
instance, may include a series of drafts of various v/ovks with 
reflections on how revisions were made and what improve- 
ment was shown to demonstnite how a student understands 
the writing process, or it may be a collection of finished 
Vv-orks to demonstrate the quality of final products. 
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The contri- 
bution a 
portfolio 
can make 
that other 
forms of as- 
sessment 
commonly 
do not is this 
aspect of 
reflection by 
students . . . . 



Another consideration for portfolios is the extent to which 
a portfolio itself, as a collection, would be more valuable 
than the uncollected assessments individually. What would 
be gained by holding a collection of work over time, in one 
location and with periodic review, over simply assigning, 
discussing, and evaluating each assignment separately? For 
some purposes, such as demonstrating students’ development 
into “writers” by showing how they have progressed, and for 
leaving the evidence where they will look over it repeatedly 
to reflect upon it (which in itself will contribute to develop- 
ment as a writer), the collection of work is the answer But 
portfolios consume ‘ime and space, and if they are simply a 
storage box for a series of reports, they are not worth the 
extra time and space they take to construct and review. 

For portfolios to be effective methods of assessment, it is 
essential to define clear and complete performance ciiteria 
against which the work will be compared (Alter, Spandel, & 
Culham, 1995). Although the criteria are used to score or 
grade students, it is not their most important function in a 
portfolio. The importance of criteria is to ensure that students 
use them as guides for selecting what goes into their portfo- 
lios and as guides for reflecting on their work. Thus, how the 
criteria are written is very important, because the language 
used in the criteria becomes the language in which descrip- 
tions of quality work are phrased. The contribution a portfo- 
lio can make that other forms of assessment commonly do 
not is this aspect of reflection by students, of living with and 
revisiting past work, of setting goals for future work and then 
evaluating whether and to what degree the goals were met, 

A survey of academic vice presidents and deans at all 
Carnegie classification Baccalaureate College II institutions 
found that of 395 respondents, 202 used portfolios for insti- 
tutional outcomes assessment, classroom assessment, and/or 
other purposes such as admissions or placement (Larson, 

1 995). Those 202 administrators were sent a second survey, 
asking for more details, to which 101 responded. Among 
respondents using portfolios, 47% used them for classroom 
assessment. Administrators reported up to 29 years of class- 
room use of portfolios, with a mean of 6.4 years. The most 
common portfolio contents reported included final drafts of 
papers (50%), student projects (47%), journals/logs (41%), 
self-evaluations (35%), faculty evaluations (26%), videos 
(26%), and drafts of final v'ork (23%). Selecting contents of 
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ponfolios used in the classroom included selection by fac- 
ulty and student together (37%), selection by student using 
faculty guidelines (34%), selection by faculty alone (9%), and 
selection by student alone (4%). Rubrics or scoring schemes 
for evaluating classroom portfolios were reported as not 
used (44%), developed by faculty for institutional outcomes 
assessment (31%), developed by faculty to serve their own 
purposes (29%), developed by an accreditiiig agency (4%), 
and “other” (5%). Classroom portfolios were typically re- 
turned to the student (Larson, 1995). 

On the positive side, administrators reported that portfo- 
lios were useful, powerful assessment tools whose chief 
strengths involved participation by students, the collection of 
multiple assessments, and the ability to demonstrate progress 
over time. On the negative side, administrators reported con- 
cerns about the logistics, accuracy, and grading of ponfolios. 

Summary 

Options for classroom assessment include paper-and-pencil 
tests, performance assessment, oral questions, and portfolios. 
Each has its strengths and weaknesses and is particularly ap- 
propriate for different learning goals. Students’ involvement in 
any of these methods increases their motivation, learning, and 
sense of ownership of the material. Once an instructor has 
decided what metliods are most appropriate for assessing 
learning goals for students, it is important to communicate the 
decision to students. The syllabus is an appropriate place to 
do so. If a syllabus makes clear to students what their goals 
for learning are, how they will be assessed, and how those 
assessments wmII be combined for their course grade, students 
can monitor their own learning, itself a worthwhile goal for 
higher education. 
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ASSESSMENT IN THE DISCIPLINES 



General principles for good assessment apply in all classes, no 
matter the discipline or course level. But the subject does mat- 
ter. Achievement goals differ among disciplines and among 
course levels within disciplines. To decide what assessment is 
appropriate for a particular purpose in a particular class, an 
instructor needs to keep in mind the discipline-specific knowl- 
edge or skills to be assessed as well as the assessment princi- 
ples that have been the focus of the first four sections. 

If assessment of knowledge of a range of facts and con- 
cepts is required, as for an introductory sur/ey course in a 
social or natural science, a paper-and-pencil test with objec- 
tively scored items is suitable. The important issue here is to 
make sure the questions are clearly written and that they 
represent a good sample of the content domain. If assess- 
ment of application of knowledge to original work of some 
sort is required, as for science laboratory skills, social sci- 
ence analysis of sources, or English use of literary^ devices in 
writing, then some sort of performance assessment is more 
appropriate than a test. The important issues are to make 
sure the task actually requires the knowledge and skills that 
the instnu’tor intends to assess and to make sure that the 
scoring criteria accurately describe quality work. 

Because the different disciplines do different w-ork, in- 
structors historically have focused on different aspects of 
assessment. For example, English instructors have long been 
concerned with the assessment of writing. A good source of 
ideas in the disciplines may be found through the various 
profe,ssional organizations, most of which have both publi- 
cations and Web sites. Several resources are discussed in this 
section as examples, but they are by no means meant to be 
exhaiLstive. Readers are urged to look at the material from 
their own professional organizations through the lens of the 
a.ssessment principles of validity’ and reliability. 

Additional excellent examples from college courses in spe- 
cific disciplines of .student assessments and of sets of student 
assessments that make up whole courses and their grades 
may he found in Effective (Walvoord Anderson, 

199H). 1‘hcsc examples demonstrate good principles of a.s- 
sessment in the various disciplines and include assignments, 
scoring guides, and rationales for what the various instaiclors 
were trying to accomplish, in enough detail to ser\e as mod- 
els that readers could u.se in their own courses. The hook in- 
c ludes examples from hiolog\', business management, conv 
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position, dental hygiene, economics, education, engineering, 
English, food and nutrition, history, mathematics, psychol- 
ogy, sociology, Spanish, statistics, and Western civilization. 

Assessment plans should be part of planning for a course 
from the very beginning (Walvoord & Anderson, 1998), but 
often they are not. One way to approach this problem is for 
instaictors to search in their professional literature for exam- 
ples of activities or assignments that are meant to be instruc- 
tional activities, and then see whether they can be adapted 
for assessment. Sometimes doing so will mean incorporating 
feedback and scoring mechanisms into the instructional 
activity and using the same activity for both purposes. It has 
the good effect of forcing the instructor to describe for the 
students the characteristics of good work, a step often over- 
looked in planning a course. Sometimes, a second version of 
the activity can be used for assessment after the first has 
been used as an instructional or practice activity. Another 
source of ideas for performance-based assessments is the 
work that professionals in the various disciplines actually do. 
Using these performances as the basis of classroom assign- 
ments and assessments is likely both to assess students’ use 
of important knowledge and skills and to help students un- 
derstand why they are learning the knowledge and skills in 
the first place. Too often classroom activities feel contrived 
and are not similar to real work, affecting their value for 
both assessment and motivation of students. 

English/I^anguage Arts 

The assessment of writing is of major importance for college 
English classes. The language arts involve two types of writ- 
ing; composition and literary analysis. Composition courses 
generally teach students how to write in various genres of 
literature. Literature courses teach students how to approach 
and analyze authors’ premises, values, purposes, sense of 
audience, development of characters, use of imagery, and so 
on. Becau.se the goals of instruction differ, appropriate as- 
sessment methods will emphasize different points. For com- 
position courses, methods that involve students in the pro- 
duction and revision of writing are needed, with all that it 
implies. Students (or any authors) mast be invested in a topic 
and have something to say about it before they can write 
well about it. Analytical .skills are imponani for composition, 
but they rank behind expre.ssive skills. For literature courses. 
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analytical skills move to the forefront. Logical expression 
tlirough structuring and supporting a good argument is possi- 
ble only if students can develop defensible theses and gather 
supporting arguments and details. These differences show in 
performance assessments in both the tasks students are as- 
signed and the scoring schemes or rubrics with which they 
are evaluated. Because the criteria for evaluation differ, feed- 
back on students’ work will also differ. 

A body of research is available on providing feedback in 
composition classes. An investigation of th' effects of a train- 
ing seminar for new English teaching assistants on their grad- 
ing of and feedback on student papers (Liggett, 1986) offers 
some thoughtful comments about grading papers. The author 
had 12 teaching assistants grade the same paper both before 
and after training, which included readings about assessment, 
seminar discussions, and practice with over 200 papers, “Be- 
fore” and “after” papers were coded for format, placement, 
focus, and purpose of comments. The new teachers made 
more comments after training, more substantive comments 
instead of emotional comments such as “good!”, and more 
comments about what the student was tiying to say instead of 
the mechanics of expression. After training, new^ teachers 
w'ere more confident that their feedback was appropriate and 
that their grades were, in fact, more reliable, demonstrating a 
higher level of agreement among graders, even though overall 
the grades for the same paper dropped slightly (Liggett, 1986). 

After training and practice grading papers, and thus presum- 
ably after giving some thought to the process of grading pa- 
pers and providing feedback, the new teachers changed roles. 
Before the seminar, their grading and feedback demonstrated 
their primary approach to their role was as editors; after the 
seminar, their primary approach was as instructors. The evalua- 
tion and judgmental aspects of the task, that is, actually assign- 
ing a grade, became more accurate (Liggett, 1986). 

Moreover, some of the new teachers would have benefited 
from instruction in writing themselves. Their feedback indi- 
cated they were not always able to give good and appropriate 
responses to students’ work, according with the general princi- 
ple of assessment that instructors must have a good, clear 
grasp of the achievement target themselves to be able to teach 
the subject and assess students’ knowledge of it (Liggett, 1986). 

Liggett also surveyed high, medium, and low achie\ ers of 
tlie 12 teaching assistants to describe their reaction to the 
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feedback. Reaction was mixed. Students do not learn much 
when writing teachers “fix" their work by closely editing it for 
them, and many students recognized that point. Others pre- 
ferred more specific feedback. Which kind of feedback is 
“best" depends on the purpose of the grading, a point that 
needs to be clarified for teachers of college English (Liggett, 
1986). If the purpose is instruction, then substantive com- 
ments that help students think on their own are mOsSt helpful. 
If the purpose is judgment, then comments that help students 
see where they lost specific points are helpful. Conflicting 
purposes of evaluation continue to exist. Where conflicts 
exist, the function of instruction and education for students 
should take precedence (Walvoord & Anderson, 1998; Wig- 
gins, 1998). In this case, it means that instructors’ “editing" 
students' work is less desirable than instructors’ comments 
that point students in a direction to edit their own work. 

Written feedback is an important but complex area of 
study in its own right. Teachers’ comments on essays can 
foreclose students’ thoughts too soon, turning revisions into 
“what the teacher wants" if teachers make marginal com- 
ments such as “You need more focus here" (Welch, 1998). 
Welch suggests a strategy she calls “sideshadowing," which 
she adapted front the theories of Morson and Bakhtin. She 
invites her students to turn in essays with their own marginal 
comments on them. Her responses, in turn, are reactions to 
their written essays, to their apparent intentions for the essay 
revealed in their marginal comments, and to the contempla- 
tions, conflicts, or decisions revealed in the students' mar- 
ginal comments. In this w\ay, both student and instructor see 
suggestions of w’hat several different directions for revision 
might be, projected from the multiple perspectives on the 
wTiting. Revisions become more thoughtful, and the process 
reflects more ownership and decision making (and learning 
about W'riting) for students than W’hen students think only 
about the instaictor’s suggestions for improvement. Theo- 
retically, even comments that instructors intend as open- 
ended invitations for students to think about revisions can be 
experienced by students as foreclosing their decisions and 
foreshadowing the “revision" to come (Welch, 1998), 

Mathematics 

The assessment of prc^blcm-solving is of major importance in 
mathematics. There are many types <^f pn'jhlems, howe\ er. 
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depending on the context. Engineers, businesspeople, scien- 
tists, and mathematicians encounter in their work different 
kinds of problems that require mathematical solutions. 
Therefore, selection of problems, the reasoning required to 
solve them, and the style of communicating results should 
depend on the purpose of the mathematics course and its 
goals for learning. 

One of the best ways to assess how students actually solve 
problems is to ask them to explain their reasoning, that is, to 
write about their math (National Council, 1989; Stiggins, 

1997) . Some writing assignments for calculus classes (from the 
mathematics department Web site at Franklin and Marshall 
College) include interesting titles such as 'The Case of the 
Dead Doornail,” “The General Spore,” and “The Case of the 
Fall From Grace” (Crannell, 1998; see also Crannell, 1994). 
What sets tliese assignments apart from some others is the 
“Checklist for Your Writing Project” that serves as both an 
assessment tool and a clear description for students of the 
achievement target they are aiming for (Crannell, 1994, 1998). 
In other words, the questions on the checklist describe the 
charaaeristics of a clear and well-reasoned explanation, be- 
ginning with a clear restatement of the problem to be solved 
and moving through explanations of all the steps to the solu- 
tion (see Table 14). Restating a problem in one’s own words 
is a recognized way for saidents to demonstrate comprehen- 
sion and not merely memorization (Nitko, 1996). A second 
important feature of this checklist is that it is a tool for stu- 
dents’ assessment of their own w^ork as well as for the instruc- 
tor’s grading. This approach increases students' ownership of 
the material and motivation and therefore improves learning. 

The NCTM (National Council of Teachers of Mathematics) 
Standards (1989) recognize the importance of developing 
students’ “mathematical power” — that is, their abilities to 
explore mathematical ideas, reason mathematically, solve 
nonroutine problems, and communicate mathematical con- 
cepts, all with some degree of self-confidence. To that end, 
the council recommends instruction and assessments aimed 
at developing these skills. Such assessments would include 
inultistep, partial-credit problems and explanations of mathe- 
matical reasoning used to solve the problems, most easily 
scored with a problem-solving mbric or with a checklist 
such as the one recommended by Crannell 0994, p. 201; 

1998) . One of Crannell’s calculus .students commented, “You 
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TABLE 14 

Sample Checklist for a Mathematics Writing Project 



Directions: 

Please attach this page with a paper clip to your writing assign- 
ment when you turn it in. This list will be used by your instruc- 
tor to grade your assignment and will be returned to you with 
comments. Keep a copy of your paper for your own reference. 
Please feel free to use this checklist as a guide for yourself 
while writing this assignment. 

Does this paper: 

1. Clearly Cre)state the problem to be solved? 

2. State the answer in a complete sentence that stands on its 
own? 

3. Clearly state the assumptions underlying the formulas? 

4. Provide a paragraph that explains how the problem will be 
approached? 

5- Clearly label diagrams, tables, graphs, or other visual repre- 
sentations of the math (if they are indeed used)? 

6. Define all variables used? 

7. Explain how each formula is derived, or where it can be 
found? 

8. Give acknowledgment where it is due? 

In this paper: 

9. Are the spelling, grammar, and punctuation correct? 

10. Is the math correct? 

11. Did the writer solve the question that was originally asked? 

Comments: 



Source: Crannell, 1998. Used with permission. 

don’t realize that you have a gap in understanding until you 
have to explain how to do it'’ (Crannell, 1994, p. 199). 

Social Sciences 

As with English and mathematics, work in the social sci- 
ences requires analysis and wTiting skills as well as a com- 
mand of discipline-specific facts and concepts. Notwith- 
standing these similarities, social science assignments differ 
from English or math assignments because of the differing 
nature of the work. Social science work includes writing 
histories, writing policy analyses, and examining economic 
and social trends. 
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Analytical and writing skills should be developed from the 
beginning of study. For example, the learning goals for a 
100-level course in Western civilization included mastery of 
the facts and concepts of European history from 1500 to 1800 
and beginning skill development in historical argumentation 
(see Waivoord 8c Anderson, 1998). Following the implications 
of these learning goals, the professor organized both his 
instruction and his assessment to coordinate with them. 

Exposure to facts and concepts was accomplished by requir- 
ing short written assignments for out-of-class reading. In this 
way, the instmctor airanged that most of the students would 
come to class having been exposed to the same new informa- 
tion, freeing class time to work on the process of using the 
information, now summarized in the students’ own writing, for 
developing and supporting historical arguments to answer such 
questions as '‘Was Louis XTV of France a good king?” To con- 
struct a course in this manner and do it well, the instructor must 
have a clear picture of the achievement target. Table 15 pre- 
sents the professor’s analysis of the skills his students needed to 
develop. Each skill has implications for instruction and for the 
performance tasks and scoring criteria to be used in assessment. 

Students’ self-evaluations remain an important principle of 
assessment in the social sciences. In a self-assessment strategy'' 
used with freshman- and sophomore-level European civilization 
classes and junior- and senior-level liistory of science classes, 
students reviewed their own analysis of primary^ sources and 
written research papers (Steffens, 1991). A simple three-question 
“self-conference sheet” could also be used in peer conferences. 
^)CTlat makes the questions good and appropriate for self- 
assessment is that they match the task s intended learning out- 
comes. Students are asked to check whether they have clearly 
described a hypothesis or thesis and w^hether they can identify 
how a reader would understand what their thesis is, points that 
are central to actual research writing. These questions also 
matcli what the instructor will look for in his evaluation of the 
students’ final papers, which will count in their grades. Follow- 
up questions, probes, and additional specific questions, general 
or tailored to specific assignments, arc useful during die process 
of drafting and revising the research papers (Steffens, 1991). 

Natural Sciences 

In college science classes, in-dass examinatit^ns continue to 
he a mainstay of student assessment (Moscovici 8c Gilmer, 
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TABLE 15 

Sample Analysis of History Skills Required 
For an Argumentative Essay 

1. Reading accurately i\nc\ud\ng an accurate jcnse of chrono- 
logical narrative). Students must be able lo report accurately 
on what they have read. They must know, for example, that 
events in 1645 could not have caused events in 1641. 

2. Realizing that published works have authors, including 
paying attention to authors’ personalities, possible biases, 
and attempts to organize material for the reader. Students 
should know who wTote their textbooks and be aware of 
the major section and chapter headings in them. 

3. Perceiving and using standard analytical categories, in- 
cluding political, social, economic, religious, and cultural 
factors often cited in explaining past events. 

4. Perceiving historical theses. Students must be able to sec 
that historians argue zboxxi the past. They debate, for ex- 
ample, whether or not absolutism was beneficial to the 
majority of the French subjects of Louis XIV. 

5. Using written sources as evidence. Facts Ijcconi^ evidence 
only when brought forward in relation to a thesis. Both 
primary sources (contemporary^ eyewitne.ss) and secondary 
sources may be used to state or defend a historical thesis. 

6. Statmg and defending a historical thesis. Accurate and 
specific examples and evidence are key to this skill; au- 
thors of secondary’ sources may be used as models, which 
can be done in two stages: (a) defending a thesis selected 
by the instmcior and (b) choosing one’s own thesis. 

7. Defending a historical thesis against counterarguments. 
Agreeing with one author of a secondary^ source is not 
enough; students need to say why they rejected carefully 
argued opposing views. Again, accurate and specific exam- 
ples are the key. 

Source.]. Breihan, cited in VCV.K'or-rd Si Anderson, 199B, p. S2. Used with 
permission. 

1996; Tobias & Raphael, 1995), stemming, perhaps, from the 
fact that for many science courses, knowledge of a lai^e num- 
ber of facts and concepts has been considered the hallmark of 
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mastery. Science faculty in feet have been observed to resist 
alternative methods of assessment (Moscovici & Gilmer, 1996), 
perhaps in part because development of alternative assess- 
ments is not the sort of work that counts as scholarly publica- 
tion in the sciences (Tobias & Raphael, 1995). Scientists, it 
seems, rely on fewer and less reliable measures of students’ 
achievement — if one considers student assessments the “mea- 
sures” taken in the “study” of one’s course instruction — ^than 
they would in the studies of the physical w'orld for their scien- 
tific research (Tobias & Raphael, 1995). 

Nevertheless, many science instructors are adjusting their 
examinations, unpublished and largely unshared, and some 
changes are more sound than others. The changes include all 
kinds of strategies for adjusting points and format (Tobias & 
Raphael, 1995). In general, these innovations will be helpful 
to the extent that they pass the criteria for valid indicators of 
achievement. For strategies that deal with allocating points, 
the proportion cf points earned for demonstrating achieve- 
ment of various knowledge, skill, and thinking targets when 
final grades are assigned must match the intended goals of 
instiTiction, both in content and in relative weight. For strate- 
gies that deal with an assessment format, the changes in- 
\^olved must allow the actual achievement the student dem- 
onstrates to be what was intended for the course's learning 
goals. Some formats tap recall more easily than reasoning or 
understanding, although this judgment is not a simple one. 

For example, a good in-class test item can call forth reasoning 
and demonstration of a student's understanding, while a bad 
take-home test item may simply require students to use the 
index of a textbook and transcribe an answer from, the book 
to the test pages. 

Laboratoiy^ w^ork and original research is a mainstay of 
scientific disciplinary study. Walvoord and Anderson’s Ef- 
fective Grading (199S) includes a running example from 
.Anderson’s biology classes. In a senior biology course, she 
assigns a research project in which students must “compare 
two commercially available products on the basis of at least 
four criteria to determine which is the ‘better' product as 
operationally defined” (p. 39). This assignment directly relates 
to a goal for the course that students will be able to design, 
conduct, and communicate the results of original research as 
w'dl as to a larger purpose many of her students have for 
taking the course — that they will soon be hired by companies 
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for which they would perform this kind of work. Thus, stu- 
dents’ performance on the assignment may be expected to 
indicate in a valid manner their level of achievement. 

The scoring criteria developed for the assignment use the 
principle that the scoring criteria and descriptors at the vari- 
ous levels on the rubrics themselves embody the characteris- 
tics of high-quality work (Stiggins, 1997). The reader is en- 
couraged to look in more detail at the assignment and its 
rubrics. Good instruction and good assessment mingle in 
this example, and the result is amusing as well as uplifting. 
Students who apply Anderson’s rubrics for a good report 
title, for example, find that “The Battle of the Suds” does not 
pass muster, w^hile “A Comparison of Arizona and Snapple 
Iced Tea for pH, Residue, Light Absorbency, and Taste” does 
CWalvoord & Anderson, 1998, pp. 70-71). 

Summary 

Assessment in the disciplines relies on the pre\fously de- 
scribed general principles for ensuring that assessment infor- 
mation is meaningful, useful for its intended purpose, and 
accurate. Its toolbox includes the paper-and-pencil test, per- 
formance assessment, oral question, and portfolio options. 
Differences in assessment among the disciplines stem from 
the fact that real work differs among the disciplines and from 
its corollar^^ that the goals of learning for a course differ 
among the disciplines. This section has provided some exam- 
ples of how the general principles for assessment apply in 
English, mathematics, social sciences, and natural sciences. 



GRADING 



This section takes up the first of two general questions about 
grading: How can the results of several different assessments 
be meaningfully combined into one composite grade for a 
course? (The second, about whether grade inflation exists 
and, if so, what can be done about it, is addressed in the 
next section.) Grading is a way to report or communicate 
information about a student’s achievement in a course. The 
next subsection briefly explores ways to communicate stu- 
dents’ achievement, and the following three offer information 
for instructors who wish (or need) to compute course grades 
as composites of the kinds of assessments discussed in the 
previous four sections. 

Ways to Communicate Students’ Achievement 

All sorts of good reasons exist to communicate with students 
about their achievement. Grading is the most formal, consti- 
tuting what has been called “official assessment” (Airasian, 
1994). Instructors often wish to communicate with their stu- 
dents throughout the learning process, providing formative 
feedback or information students can use to monitor their 
progress, understand where tliey still need work, and im- 
prove their performance. Informational feedback that helps 
students improve their own work, not merely communicate 
a judgment like “fair,” helps with students’ motivation be- 
cause it places a tool for improvement in their own hands. 
Formative assessments need not be part of official assess- 
ments: that is, it is not necessary to record a grade for every 
assignment students do. But homework problems and other 
formative assessments should be checked. It is not enough 
to simply note that the homew^ork was completed. 

The score on homew^ork and other formative assessments 
need not be recorded if the intent of an assignment is prac- 
tice and formative assessment. Doing an assignment for 
which the score does not count in the final grade allows 
students to practice ar\d make genuine mistakes so they can 
see w^here more study is needed. But an assignment that 
does not count in the final grade may also send a message 
to students that the assignment is not ver>^ impoitant. How 
.students perceive the importance of work that does not 
count in a final grade depends on several factors, including 
how clear it Ls that the assignment provides valuable practice 
on important learning outcomes, the level of students’ intel- 
lectual development and view-s of the purposes of education 
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(Perry, 1970), and the assessment environment in the class- 
room the instructor has created (Stiggins & Conklin, 1992). 

Fonnative assessment and feedback are important for learn- 
ing to occur (Black, 1998). Summative assessments provide 
overviews or summaries of previous learning (Black, 1998). 
Course grades, then, are summative assessments. Given the 
structure of college courses, the formative and summative 
functions of interim assignments blur. Any course assignment 
whose feedback students use to improve further work is func- 
tioning as formative assessment. Yet retrospectively from the 
end of the course, information from some of these formative 
assessments may be appropriate to select for construction of an 
overall indicator of students’ achievement — namely, the grade. 

Grading homework and other interim assignments must be 
kept at a manageable level, e.specially for large classes (see 
Walvcord & Anderson, 1998, for several good suggestions). 
One strategy is to develop simple rubrics for evaluating as- 
signments; not every assignment requires written comments. 
Another strategy is to grade assignments on one particular 
criterion — for example, support of an argument appropriate to 
the lesson for which it was assigned — and ignore other possi- 
ble criteria. A third strateg}% important for its instructional and 
motivational utility as w'ell as its efficiency, is to make good 
use of checklists, reflection, and other strategies for students' 
and their peers’ assessment of the work. 

Another useful way to communicate with students about 
their achievement is in a conference, either individually with 
the student or in small groups. Students may make appoint- 
ments on their ow^n initiative with an instructor outside class 
time to see how they are doing. An instmetor may also wish 
to schedule student conferences as part of a course, either 
during or outside class time, to provide feedback about stu- 
dents' work. 

Yet another way to cominunicate with students about 
their work is written feedback on papers, projects, or other 
assignments, either in addition to or in place of a grade. If 
this feedl)ack is to be informative for students and useful for 
future improvement, it needs to do just that — give informa- 
tion (Ryan, Connell, 8 l Deci, 1985). A comment like “good 
job” does not help the student as much as a comment telling 
why the job was good — for example, “good explanation of 
the poet’s imagery." A grade of B does not help the .student 
understand what would be needed for an A — for example, 
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"This essay does a good job of contrasting Ahab’s and Ish- 
mael’s points of view. The essay w^ould be stronger if you 
used these arguments to conclude what point(s) you think 
Melville is trying to make as an author.’' 

Course Grades and the Nature of Composite Scores 

Course grades usually require combining several individual 
grades into one. Several different valid methods can be used 
to aggregate grades on a set of course assignments into a 
student’s course grade. The method should depend on the 
kind of assessment information available, w^hich in turn 
should depend on the decisions about purpose and kind of 
scoring discussed in the preceding four sections, and on the 
kind of information the grade is intended to convey. 

If a grade is meant to indicate achievement of a learning 
goal for the course that is an end-state result, such as writing 
a certain kind of research report, and not an average over a 
set of learning goals that were addressed one by one, then 
some kind of final performance grade, indexing the levels of 
skill with which students ended the course, is the best choice 
for a course grade. Students’ practice work earlier in the 
semester, when they were developing skills and were free to 
make mistakes and to learn from them, should not count in 
the final grade. If a grade is meant to index a set of learning 
goals that w^ere addressed over the course of a semester with 
various assessments, some kind of averaging makes sense. 
But be aware that some information is lost in any average. 
Averages mask variations in individual grades, so two differ- 
ent sets of grades (say, A,C,F and C,C,C) can end up with the 
same final average. Instmctors should be sure, before they 
select an averaging method for calculating final grades, that 
overall performance on a set of learning goals measured with 
a set of assessments is what is called for. 

If the purpose of each assignment is clear, if each assign- 
ment measures students’ achievement on one or more learn- 
ing goals for the course, and if the grade is intended to con- 
vey the sum or average of a set of achievements for the 
term, then a composite grade is in order. Putting grades 
together into a composite for a whole course is a matter of 
combining the scores on each assignment so that the relative 
weight they contribute to the final grade matches the in- 
structor’s intentions, the syllabus, and emphases communi- 
cated to students through the use of class time and instnic- 
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tional activities. When percentage of points earned on vari- 
ous assignments is the basis for assigning grades, the weight 
an assignment carries in the final grade depends on its total 
possible points. Using total possible points is consistent with 
a criterion-referenced approach to grading, in w^hich stu- 
dents’ work is compared with standards of performance and 
one student’s grade does not depend on the grade of others. 

When students’ standing in the class is the basis for as- 
signing grades, the weight an assignment carries in the final 
grade depends on the variability of scores. Students’ perfor- 
mance on tests or assignments for which students all 
received similar scores does not affect students’ standing 
(rank) for final grades as much as students’ performance on 
tests or assignments on which scores varied widely. Using 
students’ standing as the principle for assigning grades, a 
norm-referenced approach comparing students to one an- 
other, is not consistent with the learning-centered approach 
recommended in this monograph. As noted earlier, students 
prefer, value, and work harder for learning that is assessed 
by comparison with clear standards. The logic of an instruc- 
tional model based on goals for learning also demands as- 
sessment by comparison with standards. The instructors 
question in grading then becomes, “What portion of the 
achievement targets established as important in this course 
has each student attained?” Methods of assigning final grades 
described in this section are all compatible with a criterion- 
or learning-referenced approach to instnjction and assess- 
ment. (See Oosterhof, 1987, or Ory & Ryan, 1993, for exam- 
ples of how^ to weight scores for a norm-referenced, student- 
standing-based composite grade.) 

As a “score” or measure, a course grade should be a mea- 
sure of overall achievement or accomplishment of the learn- 
ing goals for the course. Course grades should not be a 
proxy for general intelligence, a measure of effort or inten- 
tions, or any other non-achievement-related factor — partly 
because of the context of higher education, which to a cer- 
tain extent is beyond instaictors’ choices or wishes. The 
result of whatever grading procedures are applied is a grade 
or mark, stored in the student s transcript next to ihe course 
title and number, for a number of years. Long after the in- 
stmctor may have left the institution, the student will have 
Introduction to Western Civilization — B under his or her 
name. Whatever the instructor intended that B to mean, it 
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will end up meaning “a B for achievement of whatever was 
taught in Introduction to Western Civilization,” 

Thus, the instructor’s challenge is to use a method to com- 
bine the various achievement scores that are to count in the 
course grade in such a way that they give the most meaning- 
ful and accurate information one grade can convey about 
overall achievement for the course. To do so, the instructor 
must consider what relative weights the various components 
should have and use a method that does in fact allow the 
various components to carry^ those weights into the final 
composite grade. 

Comparability of Scales 

Tests or other assignments with grades that are added up as 
an accumulation of points or items correct, then calculated as 
a percentage of the total possible points, are said to be on an 
interval level scale. Thus, the space between each point or 
percentage is considered equal, and averaging is possible; 
that is, it is mathematically logical and defensible to add up 
the scores (or weighted scores) on different assignments and 
divide by the appropriate number to get a final average 
score. If the instiuctor has decided that a composite grade is 
appropriate for the course, the instructor’s challenge for valid 
grading is in selecting a method of averaging that will give 
the intended weight for each assignment in the final average. 

Percentages or points imply much more precise measure- 
ment than may really exist. For example, it may not be true 
that a student who scores 84% on the final exam really un- 
derstands more of the course material or has demonstrated 
more thinking and reasoning skills with the material than a 
student who scores 83%. 

Percentage or point scales also result in unequal grade 
ranges. Usually, the range for an F is veiy^ large, from 0 
through 60% or 68%, and the ranges for A through D are 8, 

9, or 10 points wide. When letter grades earned by finding 
the percentage of total possible points are recorded as let- 
ters, the implied 4-point scale (A = 4.00, B = 3.00, C = 2.00, 

D = 1.00, F = 0.00) no longer reflects these ranges. 

[Rubrics with very kw points, like the holistic scale in Table 
11, do not have the mathematical qualities that allow them to 
he meaningfully converted to percentages. Consider, for ex- 
ample, a 5-point rubric on which 5 is excellent performance 
with a description matching A-level work, 4 is good perfor- 
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mance with a description matching B-level work, and so on. 
These rubric levels are rank>ordered levels that should not be 
converted to percentages. If an instructor mistakenly tries to 
do so, a score of 5 would end up as an A (5/5 = 100%), a 4 
w^ould be a B or C (4/5 = 80%) depending on the scale, and 3 
or lower would be an F. The descriptions for the rubric levels 
would not match these grades; commonly, 3 would still de- 
scribe acceptable work, and sometimes 2 would be passing as 
well. If a set has enough analytical rubrics so that the total 
possible points is 25 or 30, the instructor can total the rubric 
scores and then calculate percentage. While this method is 
not entirely satisfactory from a mathematical point of view, 
the results can be useful for grading. 

What should be done when the course assignments are 
graded using a mixture of different kinds of scales? What if 
tests are graded using percentage correct, but projects and 
papers are graded using rubrics, for all the good reasons 
noted earlier? The principle involved is mathematical preci- 
sion, the amount of detail that is actually present in a mea- 
sure. Height is normally measured to the nearest inch, for 
example, and can be measured to the nearest half-inch. But 
it is not meaningful to say that someone is 64.3046 inches 
tall. It is simply not possible to measure height that precisely. 

To make scales comparable, it is necessary to express all 
scores on the same scale. It is possible to collapse from more 
precision to less, but not the other way around. So, one way 
to combine percentage grades and rubric grades is to con- 
vert them all to letter grades, then use either the median or 
weighted letter grade approach to combining them (explained 
later) for the final grade. Percentages shouki be converted to 
grades in accordance with the policy that has already been 
communicated to students, either in the syllabus or in a hand- 
book or institutional publication. Rubrics should be converted 
to grades in ways that are faithful to the meaning of the de- 
scriptions of the various levels of performance. 

Sometimes institutional policy determines grade/percent- 
age scales; other times instructors are free to choose their 
cutoff points. One example is 92-100 A, 85-91 = B, 76-84 
= C, 69-75 = D, 0-68 = F (Walvoord & Anderson, 1998). This 
scale has many variations, including the familiar 90-100 = A, 
80-89 = B, 70-79 = C, 60--69 = D, 0-59 = F. Although these 
scales look very different, with the second one appearing 
much "easier" than the first, it is important to remember that 
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tliese scales, and any institutional policy about tliem, beg 
the question "percent of what?” Ninety percent of a difficult 
assignment may be “harder” to achieve than 92% of a mod- 
erately easy assignment. Moreover, what is “easy” and “diffi- 
cult” depends on the specific context and in part on a stu- 
dent’s background and readiness to learn as well as the 
amount and quality of instruction the instructor has 
provided. 

If the instructor intends to convert all results of assign- 
ments, whether percentages or rubrics, to grades, the rubric 
levels must be written with the grading scale in mind. Tf five 
different levels of distinction for the quality of students’ work 
are needed for the final grade (A, B, C, D, and F), then a 4- 
point rubric is not precise enough, because it does not allow 
enough different distinctions. The instructor should decide 
before grading how many different quality levels he or she 
needs to distinguish and then write the rubric accordingly. 

Methods for Combimng Individual 
Scores Into Course Grades 

Four methods for combining individual scores into course 
grades are presented here, with examples: the median 
method, w^eighted letter grades, total possible points, and 
holistic grading. They ser/e different purposes, depending 
on the course. In general, the weight of an individual grade 
is the portion it contributes to the final grade for the course. 
"Weight” is also used to mean a number an instructor might 
choose to multiply by an individual grade to change the 
portion it contributes to the final grade. For example, to 
double a score s contribution, multiply it by two. Two prin- 
ciples are important here. First, grades for individual avSsign- 
ments have weights relative to the final grade whether they 
have been adjusted by multiplication weighting methods or 
not. Second, if an individual assignment grade is multiplied 
by a weight to adjust its contribution to the final grade, the 
same procedur should be followed for all students. 

Grading systems differ in the degree to which they are 
developmental, allowing early failure and practice, as op- 
posed to unit based, where each unit is important (Walvoord 
S: Anderson, 199H). The holistic method is particularly useful 
for courses where students work toward some final outcome 
and should not be penalized for their early work and pi*ac- 
tice. The median method is parti iilarly appropriate w'hen 
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individual students’ performances vary widely over a semes- 
ter, when scores are in grade or rubric form, or when grades 
are based on very few assignments. 

Grading methods should be selected in conjunction with 
planning a course, so that the relative weights assigned Co 
components of the grade match their coverage of intended 
learning goals in the same manner as the intentions for the 
course recorded on the syllabus. Weighted assignments 
should collectively index a “whole” that is a reasonable rep- 
resentation of achievement of intended course outcomes. 
Only a list of what each assessment represents can validate 
the choice of weights and demonstrate representation of 
learning outcomes. The lists of assessments without refer- 
ence to content in the examples below are only for the pur- 
pose of illustrating calculations. 

Median method 

One way to calculate final grades for a course is to use the 
median of the set of individual assignment grades — that is, 
the value that falls in the middle of the scores when they are 
arranged in order from high to low, or vice versa, and 
counted. For example, the median of the set of grades A, C, 
and F is a C, and the median of the set of grades A, C, and D 
is also a C. The median is a good way to capture “typical " 
performance in a set of measures such as letter grades or 
rubrics. It is an excellent way to handle combining scores 
when some are grades from rubrics and some are grades from 
more precise scales that have been transformed to match. 

For courses with a relatively small number of graded as- 
signments, the median describes a student’s typical or “aver- 
age” achievement better than the mean r)r arithmetic average. 
Extreme scores, say one A or one F in an otherwise B-/C-level 
performance, do not pull the median as they would the mean 
or “average,” thus giving .students the freedom to do poorly 
on one performance without terrible damage to the overall 
grade. Some instructors allow^ for this occurrence in regular 
averaging by allowing students to choose one grade to ig- 
nore, but this method is unsatisfactory^ if the resulting final 
grade does not then represent all valued instructional inten- 
tions. W^hen one grade is dropped, so is information about 
achievement on at least one learning goal for the course. 

The median’s property of not being unduly influenced by 
extreme grades is especially good for courses w ith very 
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small numbers of grades, say a midterm, a final, and a pa- 
per, If there is one unusual performance among three, it is 
not possible to tell just how unusual it is. The extreme grade 
may even be more typical of the student than the other two. 

In the median method, to weight an assignment double, 
simply enter the grade in the list twice. Suppose a student 
received an A on a final exam, a B on a paper, and a C on 
the midterm. Suppose further that the instructor wanted the 
final and the paper each to count twice as much as the mid- 
term. This student’s grade lineup, after weighting, would be 
A, A, B, B, C, and the median and final course grade would 
he a B, the middle in this array of five grades. 

In the median method, if there are an even number of 
grades in the array, the median is the grade between the 
two middle grades. For example, the median of A, A, C, D is 
a B, halfw^ay between the middle A and C, and the median 
of B, B, D, F is a C. The median of A, A, B, B is A- or B+, or 
if no minuses or pluses are allowed, the instaictor w’^ould 
need to decide whether to round up to A or down to B. 

Weighted letter grades 

This method of calculating final grades uses the familiar 
scale A = 4.00, B = 3.00, C = 2.00, D = 1.00, F = 0.00. The 
system requires a decision w'hen the course is planned 
about what percentage of the final grade each individual 
grade will be. The equation for calculation is as follows: 

(Weight of grade l)(Grade 1) + (Weight of grade 2)CGrade 2) 
+ . . . + (Weight of grade wXGrade ?z) == Final Grade 

Table 16 illustrates this method and shows hf^w tvv'o differ- 
ent students’ final grades would be calculated. 

Total possible points 

The point approach to grading assigns a range of points to 
each component of the final grade. The example presented 
in Table 17 (Walvoord 8c Anderson, 1998, p, 941 uses the 
same relative weights as those in Table l6, and illustrates the 
impact of failing grades. 

The final course grade is detennined by the percentage 
scale set by the instructor or by department policy-. This exam- 
ple uses the 90-100 - A scale. Table 17 illustrates the calcula- 
tion of final grades with the total pcxisible points methcxl for 
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TABLE 16 

The Weighted Letter Grade Method of Grading 



Percentage of course grade 


Student 1 


Student 2 


Test — ^average letter grade: 40% 


B (3.00) 


B (3.00) 


Field project — letter grade: 30% 


C (2.00) 


F (0.00) 


Final exam — letter grade: 20% 


B C3.00) 


B (3.00) 


Class participation grade; 10% 


B (3.00) 


B (3.00) 


Saidenc Ts grade = (3X.40) + C2)(.30‘) + (3X.20) + 


(3X.10) = 


2.70 = B 






Student 2's grade = (3X-40) + (0X.30) + (5X.20) + 


(3X.10) = 


2.10 = C 








the same wo students, assuming rheir Bs and Cs were in the 
middle of the possible point range for these grades. Table 17 
also illustrates the main difference between w'eighted letter 
grades and total possible points: the impact of an F. Recall that 
in tlie example, Student 2 failed the field project. In Table 17, 
Student 2’s grade is calculated with a high F (57% of the 30 
possible points for the field project) and a low F (27% of the 
30 possible points for the field project). Notice that the differ- 
ence between a high and a low F on the field project makes 
the difference between a C and a D for Student 2. 

The point approach to grading accomplishes the same 
purpose as w^eighted letter grades, namely, to assign various 
weights to the assignments in the final grade that are propor- 
tional to their relative coverage of course outcomes. The 
difference, mathematically, is that for w^eighted letter grades, 
if the grades started as “percentage correct,” then scaling has 
taken place before tne final grade is calculated. It makes the 
most difference in the F range, as illustrated. Any F average 
for any of the components in the weighted letter grade sys- 
tem counts simply as zero, while the points received for an F 
in the points system can range from 0 through 60% (or what- 
ever the cutoff is) of the total possible points. 

If all grades on individual assignments are handled by as- 
signing percentage scores, a variation of the total possible 
points approach will simplify calculations. Record everything 
in the grade hook as a percent of 100, which puts all the 
grades on the same scale and, without further w eighring, gives 
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TABLE 17 

Total Possible Points Grading 





Student 1 


Student 2 


Student 2 






ChighF) 


Clow F) 


Tests (out of 40 points) 


34 (B) 


34(B) 


34(B) 


Field project (30 points) 


23 (C) 


17 (F) 


8(F) 


Final exam (20 points) 


17 (B) 


17(B) 


17 (B) 


Class participation 


8(B) 


8 (B) 


8(B) 


(10 points) 








Total (final grade, out 


82 (B) 


76(C) 


67 (D) 


of 100 points) 









them all the same weight in die final grade. A simple sum 
indicates how many hundreds of points describe the total. For 
example, if five assignments are all weighted equally, the 
course grade can be computed on the basis of 500 points. 
Weighting should be done before summing. In this example, if 
one of the assignments should be worth twice as much as the 
others, it should contribute 200 points, for a total of 600 for the 
course, and students’ percentage scores on that assignment 
should be multiplied by 2 before being added in for the total. 

The total possible points approach to grading means that 
the course has to be well planned in advance. The number 
and relative worth of assignments, and the resulting points 
for each assignment, must be established. A point system 
makes it difficult to adjust instruction and assessment based 
on students’ needs or unsuccessful instmction. Therefore, a 
point system is recommended for courses that are fairly well 
established and have been demonstrated to run fairly consis- 
tently over time. 

Grades as a holistic rating scale 

Grades can be written as a holistic rating scale or rubric, 
constnjcled in the same fashion as .scoring schemes for indi- 
vidual assignments, that is, with a description of the work 
required for each level. WaK'oord and Anderson’s (1998) 
“definitional system” for assigning grades amounts to a holis- 
tic scale. It assumes that each component of the grade is 
important in its own right and may not be averaged in with 
other work to compensate for it. Table 18 presents an exam- 
ple of a definitional system of grading. 7'his system essen- 
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TABLE 18 

A Definitional System for Grading a Course 



To receive a panicular course grade, you must meet or exceed the standards for each 
category of work. The follov/ing illustration shows a course with two distinct cate- 
gories of work: graded work and pass-fail work. 



Course 


Graded 




grade 


work 


Pass-fail work 


A 


A average 


Pass for 90% or more of assignments 


B 


B average 


Pass for 83% or more of assignments 


C 


C average 


Pass for 75% or more of assignments 


D 


D average 


Pass for 65% or more of assignments 



Sow rev.- \Va I voord & Ander.son, 1998. Used with permission. 



tially makes a rubric description for each grade level that 
specifies quantitatively what the minimal achievement levels 
are in tw^o categories. 

Another purpose of grading that a definitional system 
can serve is in the c^se when one main learning goal has 
formed the focus of a course, as may be particularly appro- 
priate for upper-level courses that have as their major in- 
tended learning outcome a complex performance on some 
major project. For example, advanced seminars in a field 
often have as their major goal the student’s synthesis of a 
body of literature, perhaps in a major paper. Research semi- 
nars often have as their main goal the design, implementa- 
tion, and written description of some original research. 
Essentially, the grade for the course comes down to the 
grade for the major project, and it may best be described by 
spelling out the level of quality required. Table 19 presents 
an example for grading a literature review course. 

A holistic, definitional scale such as the one in Table 19 
could also be constructed for seminar courses in which dis- 
cussion and debate are the main vehicles for dealing with the 
information. In that case, students’ contributions of substance 
to the discussion, their preparation, and their considered re- 
sponses to others would be the focus of the grading aibric. 

A chapter on grading is included in the classic Teaching 
Tips: Strategies, Research and Theory for College and UnT 
versity (McKeachie, 199^^); it cites a holistic grading 

mbric from 1950. Thus, the principle of c.stablishing learning 
gcxils at the oulsei, distinguishing their relative importance. 
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TABLE 19 

A Grading Scale for an Advanced Seminar With a 
Literature Review as Its One Major Goal 

Grade 

A Major literature in the field has been located. Information is synthesized accord- 
ing to key principles or topics discussed in class or found in the literature. Rea- 
sonable conclusions have been drawn and are warranted from the results of the 
literature search. Writing is clear and readable. 

B Most major literature in the field has been located. Information is mainly synthe- 
sized by topic. Reasonable conclusions have been drawn and are warranted from 
the results of the literature search. Writing is clear and readable. 

C Some of the literature in the field has been located. Presentation may be list-like, 
reporting each piece of literature separately instead of synthesized by principle 
or topic. An attempt has been made to draw some conclusions from the list. 
Writing is mostly clear, although some points may be difficult to follow. 

D Some literature in the field has been located. Information is not organized. An 
attempt has been made to draw some conclusioas, although the conclusions may 
not be supportable. Writing is not clear. 

F No literature is cited, or the literature cited is not relevant to the topic. Informa- 
tion is not organized. Writing is not clear. 



and making grades reflect their accomplishment is not by 
any means a new recommendation: 

A All major and minor goals achieved. 

B All major goals achieved; some minor ones not. 

C All major goals achieved; many minor ones not. 

D A few major goals achieved, hut student is not 
prepared for advanced work. 

E orf None of the major goals achieved. (Travers, 
cited in McKeachie, 1994, pp. 109-110) 

Achievement of major and minor goals would need to be 
indicated by performance on various assessments keyed to 
the goals. 

Summary 

Instructors should choose the grading method that will result 
in grades that convey information about achievement of 
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learning goals for the course. It follows that two important 
considerations in choosing a grading method should be the 
nature of the course learning goals themselves — especially 
w^hether intended course learning outcomes comprise a set 
of several goals or one general developmental goal — and 
the nature of the individual assessments used to measure 
them. For courses with one general developmental goal, a 
holistic grading system is recommended. For courses with 
sets of learning goals that should be considered together in 
the final grade, the nature of individual assignment grades 
helps instructors determine an appropriate method for com- 
bining them. If only a few grades have been assigned, or if 
grades vaiy widely for individual students, the median 
method is recommended. If a fair number of grades are on 
point or percentage scales, the weighted letter grade or total 
possible point methods can be used. 




GRADE DISTRIBUTIONS AND 
GRADING POLICIES 






This section concentrates on literature about issues related to 
grading as a faculty^ activity: fairness, students’ expectations, 
grade inflation, and grading policies. Most higher education 
institutions have grading policies. Almost all have policies 
about what grades may be given. Is plus/minus grading 
allowed? Is an A+ possible? Which courses may be taken as 
pass/fail? Which courses must be taken as pass/fail? How 
will pass/fail grades be figured into the grade point average? 
Some institutions have policies about grade point averages 
required for admission into, retention in, or graduation from 
a program. Some have policies that specify the percentage 
equivalents for various grades, although that does beg the 
question “percentage of what?” 

The literature on Gradings Course 
Planning and Student Results 

Tables 20 and 21 present the results of the literature review 
specific to grading. Table 20 presents essays and other de- 
scriptive analyses, Table 21 empirical studies. Clearly, one 
issue that stands out in the current literature about grading in 
higher education is a concern about grade inflation (see the 
following subsection), but it is worth noting that although the 
concern in the current literaaire is grade inflation, grade de- 
flation has also been reported as a problem in other periods 
during this century (Milton, Pollio, 8c Eison, 1986). 

Another theme in this review is the principle that well cho- 
sen instructional goals, carefully planned syllabi, w^ell executed 
classroom instruction, and high quality interactions with stu- 
dents are more important than the grades that result (Ham- 
mons & Barnsley, 1992; Walvoord & Anderson, 1998). After 
assignments have been turned in and the instructor is prepar- 
ing lo grade them, it is too late to start w'orrying about grades. 
The end of the term is certainly too late. Concern with “how 
students do" should begin at the beginning, when instruction 
is planned and the syllabus prepared. 

Grade Inflation 

“Grade inflation’* describes the condition that exists when 
grades rise without accompanying gains in achievement. 
Conversely, “grade deflalion” describes the condition that 
exists when grades fall without accompanying drops in 
achievement. If grades in a given program rise but students 
actually learned more, either because students in the pro- 
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TABLE 20 

Essays About Grading in College Classrooms 



Source 

Basinger. 1997 



Brookhart. 1998 



Drcvfuss. 1^193 



Hammons & 
Barnsley, 1992 



Milchdl. 1998 



Topic Main points 

Grade inflation • Grade inflation is sometimes seen as an 

masks other indication that academic standards have 

problems declined, with the assumption that faculty 

know how to teach and assess well; former 
grade distributions may also be related to 
“harder” but equally misguided teaching, 
such as requiring a large amount of memo- 
rization of facts. 

• Instead of working on grade inflation, we 
should be working to ensure that appropri- 
ate content is selected, taught, and graded 
well. If a section receives high grades 
because they all achieved something, 
that’s fine. 

Grade inflation • Factors behind grade inflation include 

external pressure, internal pressure, confu- 
sion of judge and advocate roles in educa- 
tion, and change in model of education 
from information-transmission to objectives- 
driven instruction. 

• Just holding firm addresses only external 
pressure. 

Grade inflation • Concerns about grade inflation mistakenly 

make the grade the object, when the object 
of education should be learning to think 
critically in a discipline 

• Would prefer clear criteria and then 2-part 
written evaluations (student and faculty). 

• Keep ir mind adult students’ needs. 

Grading • Brief history of grading 

• Defines 4 approaches to grading; norm, 
criterion, master/, and pass/fail 

• Gives principles to use in selecting a grad- 
ing plan 

Grade variation • Inflation is n problem, contrary to popular 

belief not linked to course evaluations 
(assertion as department chair), but 
hetu^een-section variation of grades is a 
more serious problem. 

• Need to establish dear criteria for perfor- 
mance and Li.se grades to distinguish among 
IcL’cls of performance; discussion among 
faculty is a necc.ssary first step but will not 
be suffidcni to .solve the problem. 
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TABLE 21 

Empirical Studies of Grading in College Classrooms 



Study 



Context 



Sample Method Findings 



L. Cross, Frary, 


Large research 


365 faculty 


Surv’ey of 


& Weber, 1993 


university 




grading 

practices 



• Most espouse absolute 
(versus relative) stan- 
dards but don't always 
use methods accoid- 
ingly. 

• Record letter grades, 
percentage scores, and 
points mostly (more 
often percentage for 
tests than for papers, 
projects, or homework) 

• Most penalize absence 
from an exam with an 
F for the exam. 

• Don’t consider non- 
achievement factors 
much, except for bor- 
derline cases 



gram were abler than before or because the instruction was 
better, that is not grade inflation. 

Defining “grade inflation” in this w^ay sidesteps a serious 
philosophical issue about the purpose of education and the 
relation betw^een grading and the mission of schooling. Most 
schools no longer espouse a mission of ranking students but 
rather declare a mission of ensuring students’ competence in 
reading, writing, problem solving, and the disciplines. Such a 
mission implies that instructors should not be satisfied with 
their teaching unless they can justifiably assign grades indi- 
cating a high level of competence for most of their students. 

A review of studies of grade inflation and deflation 
(xMilton, Pollio, & Eison, 1986) concluded that before the 
early 1970s, grade deflation was a concern. Tw'o studies, one 
at the Women's College of the University of North Carolina 
and one at the University of California at Berkeley, found 
that in the late 1950s and early 1960s, SAT scores had risen 
hut grade point av^erages remained stable. Beginning in the 
early 1970s, that trend reversed; studies from various schools 
reported rising grade point averages. Milton, Pollio, and 
Eison s analysis ended with 1980. They concluded that col- 
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TABLE 21 (continued) 



Study 


Context 


Sample 


Method 


Findings 


Stone, 1995 


State of 


13,703 


Synthesis 


• Grade inflation has 




Tennessee 


seniors 


of data 


led to enrollment in- 






from 29 


from 


flation, which has led 






Tennessee 


several 


to budget inflation. 






colleges 


sources 


• Study uses data from 






and other 




Tennessee but sug- 






institutional 




gests it generalizes 






reports 




and recommends 
research in other 
states and at individ- 
ual institutions. 










• Administrators not 










faculty am institu- 
tions; suggesLs faculty 
“know how to teach” 
but are constrained 
from doing so. 


Summerv'ille, 


Small college 


Survey of 


Plotted grades 


♦ Grade inflation 


Ridley, & 




116 urban/ 


over time; 


occurred at the 


Maris, 1990 




suburban 


compared 


college. 






institutions 


grades for 


• Serious local depart- 






using grade 


studenLs taking 


mental differences 






records from 


courses in their 


in grading 






1967-1976 


department 


• For 39 institutions 






and college 


with the same 


providing depart- 






database 


students’ 


mental data, on 






from 


grades while 


average 3 limes as 






1979-1986 


simultaneously 


much variability’ 








enrolled .n 


between depart- 








other depart- 


ments as bemeen 








ments 


years 


Walhout, 1997 


40-year 


4,969 


Gives career 


• Has a C+ overall 




leaching 


students 


grade distribu- 


career average: 




career from 




tion and grade 


discusses in light of 




1952-1992 




averages by 


current concerns for 








career decades 


grade inflation 



lege grades differed expectations for different eras and 
that grades simply do not have a fixed value over time. 

An analysis of grade distributions and ACT scores for 
students in higher education in Tennessee from 1965 to 1991 
(Stone, 1995) concluded that the overall grade distribution in 
the state had shifted up 0.5 (on the 4-point grading scale 
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where A = 4.0) such that about 15% of 1991 college gradu- 
ates in Tennessee would not have graduated in 1965. Stone 
argued that the "lowered standards'’ apparent in this grade 
inflation were largely the result of enrollment-driven funding 
and administrative priorities chat worked against holding 
students to rigorous academic standards. 

Summer\411e, Ridley, and Maris (1990) found evidence of 
grade inflation from 1967 through 1986. They studied an un- 
named institution in detail and surveyed peer institutions for 
benchmark information. Although they found evidence of 
grade inflation, they found that differences in grading 
among academic departments were far more dramatic and, 
in their view, cause for greater concern. Because students’ 
grades from across university departments are convention- 
ally averaged to compute grade point averages, even "grade 
inflation" figures actually mask the variability in grades. And 
to the extent that grades are not comparable from course to 
course, it does not make mathematical sense to average 
them — calling into question the meaning of the ubiquitous 
grade point average. 

Thus, the studies reveal at least xwo different approaches 
to grade inflation: considering it a serious general problem or 
considering it camouflage for the more serious problem of the 
apparent comparability of achievement levels from course to 
course that grades imply. Four issues underlie the problem of 
grade inflation: external pressure, internal pressure from in- 
structors on themselves, a confusion of the roles of judge and 
advocate in our educational system, and a change in the mis- 
sion of education and model of iastruction whose effects the 
grades are designed to measure (Brookhart, 1998). 

External pressure is the common notion that in a higher 
education market increasingly driven by consumers’ needs 
and wishes, instructors are under pressure from students and 
parents to give high grades. The emotional tone in some of 
the literature reviewed here testifies to the fact that some in- 
structors do feel this pressure. The entertainment-style ap- 
proach to education — that is, breaking dowm difficult con- 
cepts into small bites that “anyone” can .swallow — is a mixed 
blessing, however. A fine line exists between accepting any 
old work and honoring anyone’s work, and educators are 
more and more pressed to draw that line as more and more 
consumers l:>uy into higher education. Where and how to 
draw that line is a negotiable i.ssuc. 
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Internal pressure refers to the fact that many instructors 
dislike grading (Hammons & Barnsley, 1992), especially be- 
cause they want to encourage their students. Some do not 
fail students who “try^" or who at least are perceived to be 
trying. This reluctance to fail students suggests the third 
issue, confusion of the role of judge and advocate in the 
educational system. The same instructor wdio is the teacher, 
the coach, the guide, the advocate "for’* the student, must 
turn around and judge the student. In a court of law or on 
the playing field, this culture separates judges and advo- 
cates, referees and coaches. But in education the same per- 
son must fulfill both functions. It is a difficult task made 
even more difficult by the fact that the function of advocacy 
is more closely related to the reasons people give for be- 
coming educators in the first place — to help students learn 
and grow — than is the function of judgment. So the instmc- 
tor s motivation to judge or grade is less compelling than the 
motivation to help, to guide, to coach, to teach. 

The fourth issue behind grade inflation is a change in the 
nature of the educational mi,ssion and the population it 
serves. As more and more people need and want to attend 
institutions of higher education, it is less and less useful to 
“sort” people based on passing and failing and more and 
more important to ensure that students can attain specified 
levels of achievement. The information-transmission model 
of education, in which the student’s job is to receive infor- 
mation, has given way to a goal-driven model of instmction. 
in which the instructor is seen as responsible for specifying 
learning goals for his or her classes, arranging instRictional 
activities aimed at students’ progress toward those goals, and 
then assessing the outcome. Short of misbehavior or inten- 
tionally ignoring instruction, most students who are given 
objectives and the means to meet them wall do so. The re- 
sulting grade distribution will not be a normal cur\^e, with 
most people scoring a C. but a skewed distribution with an 
a\ erage grade of B. 



Summary 

Faculty wiio dearly understand student assessment can both 
communicate and measure their expectations for students. 
Thus, faculty can foster students’ success wiihcuil U^wering 
standards. The literature suggests that concern over grade 
inflation is widespread and that grade inflation may be lied 
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lo political and funding pressures on institutions as much as 
to individual students' pressure on their instructors. The 
literature also suggests, however, that an excessive focus on 
grade inflation at the aggregated level (grade point average) 
diverts educators’ attention from the more important matter 
of the meaning and soundness of individual course grades 
and the quality of classroom teaching and learning. 
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CONCLUSIONS AND FURTHER 
RESOURCES FOR FACULTY 



Conclusions 

The intent of thivS monograph is to provide college and uni- 
versity instructors with a working knowledge of classroom 
assessment principles and an overview of the literature 
about classroom assessment in higher education. Assess- 
ment means gathering and interpreting information about 
students' achievement of the learning goals for a course. 
Assessment of students serves several important purposes in 
higher education: feedback to students about their progress, 
information upon which to base grades for the course, and 
evaluation of programs. 

Each of the four ways of measuring achievement of 
course objectives discussed — paper-and-pencil tests, perfor- 
mance assessments, oral questions, and portfolios — is partic- 
ularly well suited to collecting useful information for instnjc- 
tors and .students about different kinds of achie\ cment, 
including knowledge, thinking skills, procedural skills, proj- 
ects and products, and dispositions. Scoring any of these 
forms of assessment may be objective (right/wrong answers 
or yes/no to items on a checklist) or subjective (judgments 
of gradations of quality) (see “Options for Classroom As- 
sessment" for how to construct and score the various kinds 
of assessments). 

Because assessment of students provides information to 
support students’ motivation and learning and to monitor 
instruction, it is vital that information gleaned from assess- 
ment be meaningful and appropriate (valid) and accurate 
and dependable (reliable). To ensure the quality of such 
information, match assessment methods to the course’s 
learning goals, write clear and unambiguous tests, direc- 
tions, and assignments, and make sure the scoring and 
weighting of results match intended goals for the course. 
Scores should be intended for a particular purpose, and they 
should be accurate. Combining scores from several different 
assessments into meaningful course grades requires plan- 
ning. Attention must be given to the W'cighting of various 
scores that go into the grade and to their method of combi- 
nation so that the final grade reflects the relative emphasis 
of intended goals for the course. 

A relatively larger amount of classroom assessment liiera- 
rure has been written for K-12 education than for higher edu- 
catiem, and although the principles fen* gtmd classroom assess- 
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meni remain the same for both levels, this monograph con- 
centrates on assessment for college courses and young (and 
not-so-young) adult students. If anything, the higher education 
context underscores the importance of fairness and clarity in 
tests, assignments, and scoring and of clear descriptions of 
achievement targets or learning goals. Today, when a wide 
^'ariety of students attend college, learning should be more of 
a concern than grading. It is not a matter of flunking out those 
w'ho do not belong in higher education: rather, it is a matter 
of helping each student achieve tlie level he or she is capable 
of. Good assessment pro\ ides students the feedback they 
need to monitor their progress and pro\4des the instaictor a 
^'ehicle through which to fairly grade students’ achievement. 

Few empirical studies of classroom assessment in higher 
education have been completed. A much greater proportion 
of the assessment literature for higher education is focused 
on institutional assessment of outcomes and on anonymous 
classroom assessment techniques that pro\’ide feedback for 
the instaictor than on classroom assessment of individual 
students' achie\'ement. Although all three functions are im- 
portant, more studies are needed that investigate the needs, 
types, results, and effectivene:^.^ of assessment in the higher 
education classroom. 

Also needed is more professional development in assess- 
ment geared tow^ard higher education instmetors. The prepa- 
ration and experience of most college professors has been 
largely disciplinary'; the methods of education are not part of 
their training. In panicular, assessment in higher education is 
often treated as pan of academic freedoni or the instnjctor's 
prerogath'e. But instmetors who attend to assessment have 
reported their results to be good and have increased both 
.students* learning and their own excitement about their 
teaching. When instructors can see evidence of students 
learning and have confidence that their e\ idence is .solid, 
both students and instmetors benefit. 

Further Resources 

Higher education instmetors who are convinced by the argu- 
ment of this monograph that assessment of students' achieve- 
meni in university courses is a cmcial teaching function and 
a professional responsibility may wish to pursue profes.sional 
clevel(')pment in classnxm assessment. The Following pani- 
graphs identify such re.sources. 
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Course planning and grading 

Effective Grading (y^ 2 i\voo\'d 8c Anderson, 1998) contains 
some excellent suggestions fo.’' grading. It begins at the be- 
ginning, however, with course planning, and suggests that 
the assessments and instruction be planned to match the 
professors instructional intent for the course and that the 
syllabus clearly communicate these intentions to students. 
Grading is a natural extension of good instruction, commu- 
nicating the total of a semester’s woith of achievement. 

This resource is recommended first in this section be- 
cause, of all the resources surveyed, it is the single best 
source. It is comprehensive and clear, and its themes and 
examples are all consistent with principles of sound instaic- 
tion and sound assessment. It is wTitten with a wealth of 
examples from higher education across a wide variety of 
disciplines: art history, biology, business, dental hygiene, 
English, mathematics, to name a few^ Its approach to grad- 
ing is criterion referenced. The authors are themselves both 
college professors. If only one additional resource on assess- 
ment of students in higher education could be recom- 
mended, it would be this one. 

Writing good tests 

Many educational measurement and classroom assessment 
textbooks give good, clear, extended treatment of writing 
test items. Recommended in particular are Educational 
Assessment of Stude?its{Ni{ka, 1996), especially good for its 
excellent treatment of writing and scoring essay questions, 
and Measurement and Assessment in Teach i 77g {Ur\n & 
Gronlund, 1995), especially good for its treatment of writing 
multiple-choice questions. The intended audience for these 
texts includes teachers at all grade levels, but readers will 
find the examples applicable to college level study. Other 
helpful textbooks on classroom assessment include Student- 
Centered Ciassroom Assessment (SXAggms,, 1997) and Educa- 
tional Testing and MeasiU'emeJtt {K.wh\^7.yn 8c Borich, 1993). 
Tips for Improving Testing and Gradi)ig (Ovy 8c Ryan, 1993) 
is intended for higher education faculty. 

Instnict(')rs who regularly teach the same courses may 
sa\'e time and increase the validity and variety of test ques- 
tions if they develop an "item hank/' that is, a database of 
lest questions, indexed by tc:>pic and sometimes by level of 
thinking. Although item hanks can he slaved in paper files, 
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usually the term refers to a computer file produced with 
item banking sofn\^are. Once written and indexed, items can 
be stored and used in various combinations for different 
tests, year after year. A good introductory resource for in- 
structors who wish to learn more about item banking is 
“Guidelines for the Development of Item Banks” (Ward & 
Murray-Ward, 1994), 

Designing performance assessments 

An excellent how-to article about designing performance as- 
sessments based on observation and judgmient of the process 
or products of students’ work is ‘ Design and Development of 
Performance Assessments” (Stiggins, an instructional 

module published by the National Council cn Measurement in 
Education (NOME). Stigginss text (1997) is also an excellent 
source of information on this topic. Again, examples from 
basic education would not be hard to extend to higher educa- 
tion. Other NCME instaictional modules that might be helpful 
include “Assessing Student Achievement With Term Papers 
and Written Reports" (Brookhait, 1993) and “Using Ponf olios 
of Student Work in Instmction and Assessment” (Arte. & 
Spandel, 1992). A Practical Guide to Alternative Assessment 
(Flcrman, Aschbacher, 8c Winters, 1992) is a short, readable 
book that functions just as its title suggests, w'hile Strategies for 
Diversifying Assessment in Higher Education iS, Brown, Rust, 
and Gibbs, 1994) is a workbook-style presentation that is 
aimed specifically at higher education classrooms. 

The Northwest Regional Educational Laboratory (NPOC'^REL) 
publishes a wonderful resource for those who must plan and 
lead faculty development in alternative asse.ssment. The tv.^o 
editions of A Toolkit for Professional Developed {199^. 199H) 
include plans for workshop sessions of varying lengths in all 
aspects of alternative assessment. In addition, N'.VREL has a 
comprehensive and hcipful Web site with a large amount of 
space de\X)ted to assessment (h(tp://\v’\\w.nwrel.(>rg). 

Colleagues 

Higher education faculty sometimes miss out on the oppor- 
tunity to talk with colleagues and share ideas. The nature c4' 
the job makes scheduling, meeting, and simply finding time 
somewhat difficult. But .some of the mo.st interesting and 
u.seful information comes from sharing ideas tliat worked — 
and those that did not — with colleagues. 
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On many campuses, faculty can turn to local university 
exam sei*vices for technical help in developing assessments, 
scoring them, and analyzing their quality. Readers are en- 
couraged to find out what services are available on their 
own campuses. 

Other resources 

Several resources include some ideas L^at could be adapted 
to classroom assessment of individual students’ achievement 
in higher education, even though it is not their primary pur- 
pose. Sometimes these "‘sources of inspiration" are very im- 
portant, because they stretch the faculty member s mind and 
stimulate original thinking about course content, the stu- 
dents, and the context of the class. 

One such resource is Classroom Assessment Techniques: A 
Handbook for College Teachers and Cross, 1993). 

These techniques are intended for assessment of a group s 
understanding of classroom lessons and are intended to be ad- 
ministered anonymously. But the pages contain a wealth of 
ideas, many of which can be successfully adapted to nonanon- 
ymous student assessment of course achievement for credit. 

Other classroom assessment techniques could be adapted and 
modified to be suiuible for nonanonymous student assessment. 

Another resource is the American Association for Higher 
Education’s Learning Vorough Assessment: A Resource Guide 
for Higher Education (Gardiner, Anderson, & Cambridge, 

1997), a compilation of resources, including a large anno- 
tated bibliography on assessment. Its focus is institutional 
assessment and program evaluation, so its concern is more 
standardized assessment at levels aggregated beyond the 
classroom. Some of the general resources, however, are 
useful in classrooms. 

A good source of ideas about classroom assessment 
comes in the periodicals targeted to college teaching. 

Sometimes such journals include articles about assessment; 
sometimes they include articles about instaiction that an 
assessment-literate instmetor can see have implications for 
assessment. Journals addressing general college teaching 
include College Teaching eiud Journal on Excellence i?i 
College Teaching. 

Discipline-specific teaching journals are good sources of 
instaiclional and assessment strategies suitable to individual 
disciplines. Often, they are piiblislicd by professional orga- 
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nizations in the field. Some include suggestions for both 
high school and college classrooms. Such journals are 
highly recommended as a source of interesting, novel, and 
discipline-appropriate ideas. 

In sum, readers are encouraged not to stop here. In- 
stead, they are urged to think about ways that the ideas 
in this review apply to courses they teach, to try^ various 
strategies of assessment, and reflect on the results. And 
they are encouraged to consult the resources offered here 
for further study. 
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